AIM-Harvard / foundation-cancer-image-biomarker

[Nature Machine Intelligence 2024] Code and evaluation repository for the paper
https://aim-harvard.github.io/foundation-cancer-image-biomarker/
MIT License
84 stars 11 forks source link

11,467 is voumes or patches? #296

Closed gaohailiang520 closed 7 months ago

gaohailiang520 commented 7 months ago

Thank you for your great work. i want to ask that you say "11,467 unlabelled lesions for contrastive training", is that mean 11,467 voumes or patches?

dosubot[bot] commented 7 months ago

The 11,467 unlabelled lesions mentioned for contrastive training are treated as volumes. This is supported by the processing scripts and configuration files within the repository, which handle these lesions in a 3D context, indicating their volumetric nature. Specifically, the processing of these lesions involves converting 2D slices into 3D NIfTI volumes and calculating lesion dimensions in three dimensions, further affirming their treatment as volumes. Additionally, the configuration for contrastive training specifies the input data as 3-dimensional, aligning with the volumetric representation of these lesions. While patches can be extracted from these volumes for specific tasks, the original data is indeed volumetric.

#### Details **Sources** - [data/preprocessing/deeplesion/process_deeplesion.py](https://github.com/AIM-Harvard/foundation-cancer-image-biomarker/blob/master/data/preprocessing/deeplesion/process_deeplesion.py) - [data/preprocessing/deeplesion/process_annotations.py](https://github.com/AIM-Harvard/foundation-cancer-image-biomarker/blob/master/data/preprocessing/deeplesion/process_annotations.py) - [scripts/jobs.sh](https://github.com/AIM-Harvard/foundation-cancer-image-biomarker/blob/master/scripts/jobs.sh) - [experiments/pretraining/autoencoder_extract_features.yaml](https://github.com/AIM-Harvard/foundation-cancer-image-biomarker/blob/master/experiments/pretraining/autoencoder_extract_features.yaml) - [fmcib/datasets/ssl_radiomics_dataset.py](https://github.com/AIM-Harvard/foundation-cancer-image-biomarker/blob/master/fmcib/datasets/ssl_radiomics_dataset.py)
**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

1170300714 commented 7 months ago

Hello, I would like to know whether "11,467 unlabelled lesions" refers to 11,467 original 3D CT volumes (for example, complete thoracic and abdominal CT images.), or to 11,467 3D patches whether containing lesions or not, that have been cropped from the original volumes?

surajpaib commented 7 months ago

Hi @1170300714 @gaohailiang520

We use 11,467 3D volumes, but these are not complete thoracic/abdominal CTs; the volumes are limited FOV and are a part of the DeepLesion dataset.

For contrastive learning, we crop positive patches containing lesions from these 11,467 volumes and negative patches randomly sampled from the scans.