Consolidate Problem - Githubissues

kimm51 commented 1 year ago

Hello, I have faced a problem in nndet_consolidate. (Attached screenshot). Have you got any idea to overcome this and see predicted boxes over image?

Thank you. Screenshot from 2023-02-22 18-38-26

mibaumgartner commented 1 year ago

Hi,

the purpose of nndet_consolidate is to combine the models after training the 5 Fold cross validation. From the logs, it looks like you only trained a single fold, thus nndet_consolidate does not find the other models. (for a single fold consolidate is not needed).

Best, Michael

kimm51 commented 1 year ago

Hello,

I did it with just only 1 folder with adding (--num_folds 1). Because I could not run it for 5 cross val (default is 5) because of storage. Now, I hope to see predictions results (especially on image with bounding boxes) by reading pickle prediction results. Looking for seeing it.

Thank you,

Best.

mibaumgartner commented 1 year ago

You can simply skip the consolidate step for a single fold.

Before running nndet_predict please copy the best checkpoint into a subfolder so only the last model is predicted. There is a helper function to visualize the predictions via nndet_boxes2nii -> this function is only intended for visual inspection.

kimm51 commented 1 year ago

I consolidated and predicted. For the visualization, _nndet_boxes2nii Task008_HepaticVessel RetinaUNetV001_D3V0013d has been run but in which folder was it saved? I could not see it anywhere.

Thank you for your return.

mibaumgartner commented 1 year ago

It is in the training folder under {val/test}_predictions_nii

kimm51 commented 1 year ago

Thank you!

I want to ask why there are 6 coordinates for box in prediction box json file. I thought it must be 4 (x y width height). I could not understand what those coordinates are. (I would have thought it as labeloverlayed but they are all boxes without containing vessels).

a json file contains a prediction like this. 1": { "score": 0.6372025609016418, "label": 0, "box": [ 36, 283, 39, 335, 290, 351 ] },

Best,

mibaumgartner commented 1 year ago

The coordinates are in x_min, y_min, x_max, y_max, z_min, z_max format given an x,y,z image after Sitk conversion (so the itk image is ine z,y,x). The coordinates don't point to your target structures or the generated nifty file?

kimm51 commented 1 year ago

Yes, I have tried to understand. That seems different.(Why all boxes color are different? And I thought vessels should be more smaller than this one.) Am I wrong? (left one is data, right is boxes (for that slice)).

mibaumgartner commented 1 year ago

The boxes have different colors since they correspond to different predictions, each predicted object has one box and thus one color.

Usually vessels follow a certain trajectory and extend through large parts of an organ => when converting those long tubular structures into boxes, the boxes are usually quite large. Is the number of vessels known a priori? Usually, I would have thought of vessels as a segmentation problem rather than a detection problem (maybe https://arxiv.org/abs/2206.01653 Figure 6 could help you to identify the underlying problem).

kimm51 commented 1 year ago

No, The number of vessels is not known. (I did read that article about a year ago (this version is really extended.))

Also , I want to ask about preprocess section ,too. In cropped raw data, seg annotated data and original data for segmetation is not overlapped.

Is this because there are no vessels outside the specified countour (segmented red contour)?

mibaumgartner commented 1 year ago

The segmentation should overlap with the data of the same stage (i.e. preprocessed data and seg should have the same overlap as the original segmentations and data), but the preprocessed segmentation does not overlap with the original images. Does that answer your question or could you rephrase it?

kimm51 commented 1 year ago

Yes, thank you.

When I read data from cropped raw (they are 4d 2x56x512x512 for 23. data), the first one is original image and the second one segmented. When I overlap them (segmented over original image (red one)), I saw their overlapping not fully matching. (It should be likely blue one). Because of this, I thought segmented (in cropped raw folder) data is cropped and preprocessed for predominantly vessels regions. But I think I am wrong and It is annotated like that. (Image is from 28. slice)

For raw preprocess folder, their dimensions (1x187x615x615) and contents are different. They don't overlap,too. (Image is from 90. slice)

Thank you for your return,

Best,

mibaumgartner commented 1 year ago

I think I understand your problem now and where the confusion stems from. Task 008 is the Hepatic Vessel dataset (thus the vessel) but it contains two types of annotations: there is a tumor class and the vessel class. The Vessel class is removed when the dataset is prepared and we only keep the tumor class for detection => we detect tumors not vessels. The segmentation only covers the tumor area and not the entire liver.

kimm51 commented 1 year ago

Thank you very much! I understand now!

Thank you for your return!

Best,

kimm51 commented 1 year ago

Hello,

I want to ask question again about this dataset (Hepatic Vessel). How could I decrease training time (1 epoch takes 1 hour 40 min. and for 2600 epochs it means it will take about 4 months). My graphic card rtx4000, 46 gb Ram and is this not enough to implement and reach your result? What do I have to decrease training time? Becuase I could have just only 1 fold cross validation with just only 28 epochs.

If you inform, I will appreciate for it.

Best,

mibaumgartner commented 1 year ago

It's 60 epochs with 2600 batches per epoch.Please check the other Issues and FAQ for Tipps to identify potential bottlenecks.

kimm51 commented 1 year ago

60 epochs for 5 fold cross validation or for one fold?

mibaumgartner commented 1 year ago

It is 60 epochs for a single Fold, usually that should take around 2 days per fold on an RTX 2080ti (which is quite similar to nnU-nNet from the training time)

kimm51 commented 1 year ago

For a single fold, I have been waiting for 4 days. When I checked GPU util, It was about %100, I think the problem is about pytorch version (version is 1.13 and that is needed in requirements.txt). Do I have to upgrade it to 2.0?

mibaumgartner commented 1 year ago

That might indicate that mixed precision training might not work as expected. Unfortunately, I wasn't able to test it with 4000 GPUs yet so I'm not quite sure if pytorch 1.13 already has full support (I thought so though). I didn't observe problems with A100 GPUs which is the newest I can access to right now (in fact training time comes down to ~1-1.5 days).

kimm51 commented 1 year ago

Hello,

In 5 days, the training has finished ( I did not force for anything). But I have just only fold0 folder. (Expected to have fold1, fold2 fold3 and fold4 but not exists). Am I skipping something? Because test results are AP@0.1 is 0.63 which is very close your paper results. (in validation results it is 0.814 (bigger than your results)).

mibaumgartner commented 1 year ago

If you only ran one command than it only trained one model. The readme explains how to run the other folds.

1) Validation result: the validation result is not comparable since it is probably only computed on a single fold. For the paper we ran a 5 Fold CV and aggregated all the folds to one dataset and ran the evaluation there, i.e. 80% of the other cases would be missing in the single fold evaluation. 0.814 is in the same range than our original model for Fold 0 as well.

2) Test result: We only evaluated the test set once with the final ensemble, so we don't have information on how a single model performed on it. It might very well happen that a specific fold might score quite well on the test set. Generally speaking, we observed better and more robust performance of the ensemble in other datasets/competitions.

kimm51 commented 1 year ago

Oo, Thank you!

I think You mean that after first fold have been done, I have to go on training as _nndettrain 008 -o exp.fold=1,2..4 --sweep ?

mibaumgartner commented 1 year ago

Yes exactly

github-actions[bot] commented 10 months ago

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] commented 9 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.

MIC-DKFZ / nnDetection

Consolidate Problem #150