MIC-DKFZ / nnDetection

nnDetection is a self-configuring framework for 3D (volumetric) medical object detection which can be applied to new data sets without manual intervention. It includes guides for 12 data sets that were used to develop and evaluate the performance of the proposed method.
Apache License 2.0
553 stars 97 forks source link

About training dataset #153

Closed kimm51 closed 9 months ago

kimm51 commented 1 year ago

Hello,

I have created my dataset and dataset.json file. But I have always faced _Expected /opt/data/Task100_MeniscusTear/raw_splitted/imagesTr/meniscustear0000.nii.gz to be a raw splitted data path but it does not exist.. But I have never had a labelled data like this. Does the data need to begin with 000.nii.gz?

Thank you!

mibaumgartner commented 1 year ago

Hi,

does this section from the readme help?

Image Format
nnDetection uses the same image format as nnU-Net. Each case consists of at least one 3D nifty file with a single modality and are saved in the images folders. If multiple modalities are available, each modality uses a separate file and the sequence number at the end of the name indicates the modality (these need to correspond to the numbers specified in the data set file and be consistent across the whole data set).

An example with two modalities could look like this:

- case001_0000.nii.gz # Case ID: case001; Modality: 0
- case001_0001.nii.gz # Case ID: case001; Modality: 1

- case002_0000.nii.gz # Case ID: case002; Modality: 0
- case002_0001.nii.gz # Case ID: case002; Modality: 1
If multiple modalities are available, please check beforehand if they need to be registered and perform registration befor nnDetection preprocessing. nnDetection does (!)not(!) include automatic registration of multiple modalities.

All image need to end on _000X so nnDetection can derive the modality (while this seems redundant for CT data since there is only as single modality, it is needed for setups with multiple modalities such as MRI)

kimm51 commented 1 year ago

Yes, I have read it. I have renamed all file names as meniscustear_0XXX but not inorder. I think There must be an order. Am I wrong? (Or Do I have to change meniscustear_0XXX as case0000_0XXX (only replacing meniscustear as case0000))?

I have converted MRI scans (hdf5) to dicom and then dicom2nii. At the end, I have gzipped the files. For annotations, I have created masks as circle (converted from bounding boxes) in terms of instances and json files.

Thank you!

mibaumgartner commented 1 year ago

The first part of the file name can be arbitrary, it should uniquely identify the image (patient) though. The last part, i.e. 000X, should indicate the modality of that specific file.

For example if in the dataset json, modality 0 was specified as T1 and modality 1 was specified as T2, all files containing the T1 images should end with _0000 and all files containing T2 should end with _0001

kimm51 commented 1 year ago

I think I have to change my volumes names like meniscustear_0001_0000. nii.gz, don't I ? (For modality 0 : CT) There is only 2 contrasts . They are PD with fat saturation and without fat saturation. Do I change the modality in dataset.json as "modality": { "0": "PD" "1":"PDFS" } ? I thought all modalities should be CT in this implementation.

(According to modality, I could change names like meniscustear_0001_0000.nii.gz for PD and meniscustear_0001_0001.nii.gz for PDFS? )(or meniscustear0001_0000.nii.gz or meniscustear0001_0001.nii.gz )

mibaumgartner commented 1 year ago

Yes, it should be meniscustear_0001_0000.nii.gz and meniscustear_0001_0001.nii.gz for the images. Note: the labels should be names meniscustear_0001.nii.gz and meniscustear_0001.json than.

If the difference between PD and PDFS is only due to the contrast agent and both modalities/sequences represent HU units (like in a CT), you should enter "CT" for both in the dataset.json.

kimm51 commented 1 year ago

Hello again,

Thank you for your support. I have been training my dataset for 2 days and mAP@0.1 is negative and always warning about no results found for coco metric for class class1 . I think there is something wrong. For segmentation annotation, I used bounding boxes area instead of converted circles. Could this be the problem? I appreciate for recommendation if you return.

Best,

mibaumgartner commented 1 year ago

If no ground truth object is present in the validation/dataset it will fill in with a negative -1 value. This means, that your dataset json seems to contain classes which are not present in your dataset or something is wrong with the mapping in the json files.

kimm51 commented 1 year ago

Actually, I have configured my json file according to maximum bounding box numbers in the slices of a voxel. If , I have max. 2 bounding boxes in a slice of a voxel, I gave it named like this. According to this, seg. pixels were assigned. { "instances": { "1": 0, "2": 0 } }

In dataset, there is only 1 class named tumour. I still don't understand where I am wrong.

Now, I have had a single fold validation result like this.

" mAP_IoU_0.10_0.50_0.05_MaxDet_100": "-0.2565500412455978", "0_mAP_IoU_0.10_0.50_0.05_MaxDet_100": "0.48689991750880435", "1_mAP_IoU_0.10_0.50_0.05_MaxDet_100": "-1.0", "AP_IoU_0.10_MaxDet_100": "-0.1517750784238376", "0_AP_IoU_0.10_MaxDet_100": "0.6964498431523247", "1_AP_IoU_0.10_MaxDet_100": "-1.0", "AP_IoU_0.20_MaxDet_100": "-0.18667910113875377", "0_AP_IoU_0.20_MaxDet_100": "0.6266417977224925", "1_AP_IoU_0.20_MaxDet_100": "-1.0", "AP_IoU_0.30_MaxDet_100": "-0.23778251496761446", "0_AP_IoU_0.30_MaxDet_100": "0.524434970064771",

There is a inconsistency between results. If you help, I will appreciate for it.

Best,

mibaumgartner commented 1 year ago

As the evaluation file already suggests, it seems like you have configured two classes => that is why there is "0_YYY" and "1_YYY" and since the dataset only contains one class it fills it in with the -1 value. The evaluation script loads the classes directly from the dataset.json file (as can be seen here https://github.com/MIC-DKFZ/nnDetection/blob/d637c5e2da16e0fe7cf8a5b860907eb57e60d4fe/scripts/train.py#L462), so I think that you have two entries in the labels section there. Since you only have one class, there should only be a single entry though.

kimm51 commented 1 year ago

My dataset.json is this:

{ "task": "Task100_XXX", "name": "XXX", "target_class": 1, "test_labels": true, "modalities": { "0": "CT" }, "dim": 3,

"labels": {
    "0": "background",
    "1": "Tumour"
}        

}

There is only one class in my dataset (that is meniscus tear and named as Tumour.) For labelTr and labelTs ,I counted max. bounding boxes number inside the slices of a voxel. Do I have to detete "0": "background" in dataset.json file?

Thank you for your return,

Best,

mibaumgartner commented 1 year ago

Yes, there should only be a single entry =>

"labels": {
    "0": "meniscusTear",
}
kimm51 commented 1 year ago

Yes, there should only be a single entry =>

"labels": {
    "0": "meniscusTear",
}

Do I have to change "target_class": 1, as "target_class": 0 (as id)? or is it the number of class?

Thank you very much.

Best,

mibaumgartner commented 1 year ago

Yes, if you intend to run patient level evaluations, is should be set to 0

kimm51 commented 1 year ago

Hello again,

Thank you for your return. I want to ask about test dataset. For different test datasets, Do I have to do the steps (preprocess and unpack) ? Is changing the test dataset (imageTs) and labelTs enough for evaluation?

Thank you,

mibaumgartner commented 1 year ago

No, the inference pipeline will automatically run the preprocessing etc. no need to run it manually. The test labels need to be prepared via nndet_prep though.

kimm51 commented 1 year ago

No, the inference pipeline will automatically run the preprocessing etc. no need to run it manually. The test labels need to be prepared via nndet_prep though.

I think nndet_predict does it (preprocessing)(for imagesTs) and I only need to prepare and preprocessed labelTs? Am I wrong?

Best,

mibaumgartner commented 1 year ago

Yes, that is correct

kimm51 commented 1 year ago

Hello again,

Thank you for your support. It worked in training with fully sampled kspace(kspace_rss) (with the ISMRM header files). I tested my fully sampled test kspace and mAP@0.1 was 0.654. The reason asking you for different test datasets evaluation is I wanted to evaluate this trained network with different reconstruction techniques. (Zerofilled, UNET and etc.) (with the same dataset). As it stands, pathologies position did not change. Firstly, I have reconstructed my test dataset with 4 fold accelerated (with Deep models like UNET etc and Zerofilled) and converted them in nii.gz format. When I test it with changing ImagesTs (with reconstructed datasets (4 fold accelerated )) (In 4 fold accelerated UNET reconstruction I also preprocessed the dataset) and evaluated, I always had mAP@0.1 as 0.654. (Just only used nndet_predict and nndet_eval) This is not possible. Because especially in Zerofilled, the resolution is really bad and this result is unreasonable not consistent. I also trained 2D networks(YOLO series,RetinaNET etc. ) with my dataset (trained with fully sampled and tested with different reconstruction tehniquues ( I converted my dataset as jpeg slice by slice) and I have had different mAP results. I really don't know where I am wrong. I got stuck into this. If you have any recommendation, opinion and share with me, I will appreciate for it.

Thank you,

Best,

mibaumgartner commented 1 year ago

Hi,

indeed, I think the inference needs a bit of a redesign in the future, it is not the most user friendly version right now... It was primarily design to run inference once and not multiple times.

What probably happened: when you ran nndet_predict the first time, it preprocessed the test data and saved it into {your task}/preprocessed/D3V001/imagesTs. It will always predict everything in that folder. When you replaced the data (probably the files had the same name) nndetection checks if the files already exist in the preprocessed imagesTs folder and if they do, it won't run the preprocessing again -> i.e. all of your predict runs were probably based on the original data since it was never replaced in the preprocessed folder. So make sure to delete imagesTs in the preprocessed/D3V001 folder before running the next round of inference.

github-actions[bot] commented 10 months ago

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] commented 9 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.