Project-MONAI / MONAI

AI Toolkit for Healthcare Imaging
https://monai.io/
Apache License 2.0
5.71k stars 1.04k forks source link

port tcia-related code to tcia_utils #7417

Open kirbyju opened 7 months ago

kirbyju commented 7 months ago

Is your feature request related to a problem? Please describe. It would be excellent if we could improve users' ability to query/download datasets from The Cancer Imaging Archive and load them in MONAI. Currently there are a few old examples of doing this in MONAI: https://github.com/search?q=org%3AProject-MONAI%20tcia&type=code. However, there is now a "tcia_utils" PyPI at https://pypi.org/project/tcia-utils/ and many examples of using it at https://github.com/kirbyju/TCIA_Notebooks that are being updated regularly.

Describe the solution you'd like It would probably make sense to leverage port any TCIA-related code that lives in the MONAI repo over to https://github.com/kirbyju/tcia_utils and investigate whether there is anything else should be implemented in tcia_utils to make it easy to prep data for MONAI.

https://github.com/kirbyju/TCIA_Notebooks/blob/main/TCIA_REST_API_Downloads.ipynb provides several simple examples showing different use cases for downloading DICOM radiology data.

I think it would also be valuable to ensure that the DICOM SEG and RTSTRUCT segmentation data on TCIA can be easily loaded in MONAI. https://github.com/kirbyju/TCIA_Notebooks/blob/main/TCIA_Segmentations.ipynb has examples related to this.

I am also working on a new module that will be useful for using our newest API to extract supporting data that can be used for classification tasks (e.g. clinical demographics/outcomes, genomic and proteomic subtypes) and would love your input on how to make that most useful.

Additional context This was discussed in the 1/26/24 MONAI developers meeting and I'm submitting this issue at the suggestion of @aylward and @ericspod.

ericspod commented 7 months ago

Hi @yiheng-wang-nv I believe the code in core relating to TCIA was contributed by you. Would you have any feedback on what we would want to do with what's there? The key piece of code is the dataset for TCIA data, but we have to ensure this uses the current API. We would still need our own dataset class but maybe the core of what the current functionality is could be ported.