Luffy03 / Large-Scale-Medical

[CVPR 2024 Extension] 160K volumes (42M slices) datasets, new segmentation datasets, 31M-1.2B pre-trained models, various pre-training recipes, 50+ downstream tasks implementation
Apache License 2.0
100 stars 7 forks source link

Including downstream validation sets in pretraining dataset #25

Closed HadiHammoud44 closed 1 week ago

HadiHammoud44 commented 1 week ago

Many thanks for your work. I was checking data_utils_abdomen.py and I realized that the validation set of some datasets like BTCV, amos, ... (that are later used for finetuning) is being used while pretraining. Could you please clarify?

(table1 of the paper says that only the training set of BTCV is used, which has size 24)

Luffy03 commented 1 week ago

Dear hadi, as described in Table2 caption, the labels of val sets are unseen in pre-training. In most related SSL works, we find that they did not strictly exclude the val sets in pre-training. This is because in most cases, for datasets without available test sets, we have to adopt 5-fold validation. In this case, you cannot exlude val data in pre-training.

wangshansong1 commented 1 week ago

Please allow me to add a question. Do you use nnUNet's default 5-fold cross-validation? Or the one provided by the dataset? For example, BraTS21.

Luffy03 commented 1 week ago

Please allow me to add a question. Do you use nnUNet's default 5-fold cross-validation? Or the one provided by the dataset? For example, BraTS21.

For most of the experiments, we use the splits the same as nnUNet. But for BraTS21, we use the one provided by the dataset, as shown in brats21_folds.json

HadiHammoud44 commented 1 week ago

Thanks for answering. I understood that you adopted 5-cross validation for finetuning, but do you report the mean and std among folds anywhere in the report or supplementary (if existed)? If not, how did you choose the values reported in the tables?

Luffy03 commented 1 week ago

Dear hadi, We did not report the 5-fold results in the paper since there are many datasets and compared methods, which will make the tables too bulky and confusing. For most of the datasets, there are pre-defined splits for validation (e.g., word, brats, ct-rate .....) and we adopt the same settings for fair comparisons. For some datasets with test leaderboards (e.g., amos, flare23, kits ....), we report the results on test leaderboard (e.g., https://codalab.lisn.upsaclay.fr/competitions/12239#results). For other datasets, we set the same fold as nnunet for all the compared methods for fair comparisons.