generic conversion scripts for bringing TCIA data into SALT/MONAI

kirbyju commented 7 months ago

This looks really cool! I noticed that you've used a few datasets from The Cancer Imaging Archive as I'm seeing names like SAROS, LCTSC, CT-ORG and that you provided some conversion steps to prepare each of them for your model. Do you think there is some way to integrate generic preprocessing functionality into SALT/MONAI that would allow people to go from TCIA DICOM data with SEG/RTSTRUCT labels to something that's shovel-ready for segmentation models? Or will it always require dealing with quirks that are unique to each dataset?

Also, just FYI, https://github.com/kirbyju/TCIA_Notebooks/blob/main/TCIA_REST_API_Downloads.ipynb may be of interest to you. You could potentially use some of the examples there to help people easily download the images for datasets where you have prepared conversion scripts so they don't have to manually go to our website and find these images.

Best, Justin

kirbyju commented 6 months ago

Hi all, just checking to see if there was any progress on this and also to inform you that it may make sense to form an official MONAI working group proposal related to this: https://docs.google.com/forms/d/e/1FAIpQLSe6H4rZMG_zP7z8cw84VvkFTtz6BYWWZ5TNcFj1qUjmTEh1Ww/viewform. I'd be happy to participate in the WG with you to provide input from the TCIA side of things to help with accessing our APIs, answering questions about how data are organized, etc. Please take a look and let me know what you think!

Best, Justin

Goku1110 commented 6 months ago

Hi Justin,

Thanks for your comment and the really good ideas! We definitely support the suggestion that it would be great for the use and evaluation of the package if we could spare users the manual lookup of the data. We are therefore currently thinking about a script that downloads the listed data records and converts them into the format shown in the README. (as for example in our SAROS Respository which goes in a similar direction as your Jupyter Notebook -> https://github.com/UMEssen/saros-dataset/blob/main/download.py)

Regarding generic preprocessing: We think that it should not be a problem to create generic processes for the conversion from TCIA/DICOM to other formats like in our case NIFTI. From a certain level of detail or question, however, it may be that a certain harmonization of data is necessary, which we could also discuss on the basis of SALT.

Regarding the MONAI working group: This is definitely something we would be interested in, as we think an integration of SALT into MONAI would be great. In general, it would therefore make a lot of sense from our point of view if we could bundle several perspectives, backgrounds and skills in this regard in a MONAI working group, as you suggested.

I hope this answers most of the questions raised and we can discuss this further. Thanks also for your input!

Best René

kirbyju commented 6 months ago

Hi René,

Sounds great! I'd like to add one more very important point. If you're going to be converting the data to NIfTI for processing, do you have a plan to save the output back into DICOM after the segmentations are created? There are tools such as https://github.com/QIICR/dcmqi which https://github.com/fedorov could probably help with if you have questions. This is critical to making the output segmentation data more Findable, Accessible, Interoperable, and Resusable (FAIR).

Best, Justin

UMEssen / SALT

generic conversion scripts for bringing TCIA data into SALT/MONAI #5