HealthML / self-supervised-3d-tasks

Apache License 2.0
185 stars 39 forks source link

Finetuning stage: What is the format of the BRATS dataset to be used for finetuning #23

Closed lokeycookie closed 2 years ago

lokeycookie commented 2 years ago

Hello, I have just run finished running a pretraining task which is rotation_3d and I now have pretrained weights. Currently, I wanted to use the weights that are trained from using rotation_3d algorithm on a BRATS dataset for 3D segmentation purpose by running finetune.py.

Thus, I have the following questions.

  1. In the data_dir_train and data_dir_test, how is the BRATS dataset arranged? For example, under the data_dir_train folder, are there two folders, one folder for the BRATS images and the second folder for the labels?
  2. Are all the files (including the BRATS labels and images) saved in a numpy format?
  3. If the labels (ground truth) of BRATS dataset is saved in a numpy format, what is the size of the numpy array? Is it (128,128,128,4)?

Please help me! Sorry for troubling everyone for these questions. Appreciate if anyone is able to reply quickly.

aihamtaleb commented 2 years ago

Hi @lokeycookie ,

The answer to your questions is yes, to all of them.

Please let us know if everything works fine.

lokeycookie commented 2 years ago

Hi @aihamtaleb ,

When I tried running this line: python finetune.py self_supervised_3d_tasks/configs/finetune/rotation_3d.json, I got the following error.

Traceback (most recent call last): File "finetune.py", line 4, in main() File "/home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/finetune.py", line 418, in main init(run_complex_test, "test") File "/home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/utils/model_utils.py", line 67, in init f(args) File "/home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/finetune.py", line 340, in run_complex_test gen_train, gen_val, x_test, y_test = data_loader.get_dataset(i, percentage) File "/home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/test_data_backend.py", line 196, in get_dataset self.dataset_name, self.batch_size, f_train, f_val, train_split, self.kwargs File "/home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/test_data_backend.py", line 153, in get_dataset_train batch_size, f_train, f_val, train_split, data_generator=PatchSegmentationGenerator3D, kwargs, File "/home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/test_data_backend.py", line 35, in get_dataset_regular_train kwargs, File "/home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/data/make_data_generator.py", line 149, in get_data_generators kwargs) File "/home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/data/make_data_generator.py", line 26, in get_data_generators_internal train_data_generator = data_generator(data_path, train, **train_data_generator_args) File "/home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/data/segmentation_task_loader.py", line 124, in init super(PatchSegmentationGenerator3D, self).init(file_list, batch_size, shuffle, pre_proc_func) File "/home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/data/generator_base.py", line 35, in init assert len(file_list) > 0, "received no files" AssertionError: received no files

(some context)

In the finetune rotation_3d config file, I did write the directory path as shown here: "algorithm": "rotation", "data_dir_train": "/hpctmp/e0310071/BRATS_train_5%", "data_dir_test": "/hpctmp/e0310071/BRATS_test_data",

For the data directory, I store the numpy files as shown here. The numpy files of BRATS images are stored in a folder called Images, while the numpy files of BRATS labels are stored in a folder called Masks. The format to how BRATS numpy files are stored in the train directory is the same. +BRATS_test_data ---+Images ------+ BRATS_437.npy ------+ ... ---+Masks ------+ BRATS_437.npy ------+ ...

Is there anything that I did wrong for the finetuning stage? How do I debug this error?

aihamtaleb commented 2 years ago

Hi @lokeycookie Please make sure that your training and test datasets have the following format: +BRATS_train --+ BRATS_437.npy --+ ... +BRATS_train_labels --+ BRATS_437_label.npy --+ ...

and similarly the test set.

These are the lines that expect this format: https://github.com/HealthML/self-supervised-3d-tasks/blob/master/self_supervised_3d_tasks/data/segmentation_task_loader.py#L22

lokeycookie commented 2 years ago

Hi @aihamtaleb ,

Sorry for troubling you again but I got the same error previously. I have changed the format of my training and test datasets as shown here: +test --+BRATS_train ----+BRATS_437.npy ----+... ----+BRATS_484.npy --+BRATS_train_labels ----+BRATS_437_label.npy ----+... ----+BRATS_484_label.npy

+train_5% --+BRATS_train ----+BRATS_001.npy ----+... ----+BRATS_022.npy --+BRATS_train_labels ----+BRATS_001_label.npy ----+... ----+BRATS_022_label.npy

My full config file for finetune rotation_3d.json is as follows: { "algorithm": "rotation", "data_dir_train": "/hpctmp/e0310071/train_5%", "data_dir_test": "/hpctmp/e0310071/test", "model_checkpoint": "/hpctmp/e0310071/saved_model/rotation_brats/weights-300.hdf5", "dataset_name": "brats", "train_data_generator_args": {"label_stem": "", "shuffle": true}, "val_data_generator_args": {"label_stem": ""}, "test_data_generator_args": {"label_stem": ""},

"data_is_3D": true, "val_split": 0.05,

"enc_filters": 16, "data_dim": 128,

"loss": "weighted_dice_loss", "scores": ["dice", "jaccard", "brats_wt", "brats_tc", "brats_et"], "metrics": ["accuracy", "weighted_dice_coefficient", "brats_metrics"],

"top_architecture": "big_fully", "prediction_architecture": "unet_3d_upconv", "pooling": "max", "number_channels": 4, "batch_size": 2,

"exp_splits": [50,25], "lr": 1e-3, "epochs_initialized": 400, "epochs_frozen": 0, "epochs_random": 0, "epochs_warmup": 25, "repetitions": 3,

"clipnorm": 1, "clipvalue": 1 }

Did I do something wrong for this finetuning stage and how to resolve this issue?

aihamtaleb commented 2 years ago

Please attempt to change the name of the train directory, and remove % from its name.

In addition, I think you should also remove label_stem

"train_data_generator_args": {"label_stem": "", "shuffle": true},
"val_data_generator_args": {"label_stem": ""},
"test_data_generator_args": {"label_stem": ""},

from your config files. Try replacing with:

"train_data_generator_args": { "shuffle": true},
lokeycookie commented 2 years ago

Hi, I have changed the name of train directory as shown here but I still obtained the same error (Assertion error: Received no files)

+test --+BRATS_train ----+BRATS_437.npy ----+... ----+BRATS_484.npy --+BRATS_train_labels ----+BRATS_437_label.npy ----+... ----+BRATS_484_label.npy

+train --+BRATS_train ----+BRATS_001.npy ----+... ----+BRATS_436.npy --+BRATS_train_labels ----+BRATS_001_label.npy ----+... ----+BRATS_436_label.npy

My config rotation_3d.json is as follows:

{ "algorithm": "rotation", "data_dir_train": "/hpctmp/e0310071/BRATS_data_128/train", "data_dir_test": "/hpctmp/e0310071/BRATS_data_128/test", "model_checkpoint": "/hpctmp/e0310071/saved_model/rotation_brats/weights-300.hdf5", "dataset_name": "brats", "train_data_generator_args": {"shuffle": true}, "val_data_generator_args": {"shuffle": false}, "test_data_generator_args": {"shuffle": false},

"data_is_3D": true, "val_split": 0.05,

"enc_filters": 16, "data_dim": 128,

"loss": "weighted_dice_loss", "scores": ["dice", "jaccard", "brats_wt", "brats_tc", "brats_et"], "metrics": ["accuracy", "weighted_dice_coefficient", "brats_metrics"],

"top_architecture": "big_fully", "prediction_architecture": "unet_3d_upconv", "pooling": "max", "number_channels": 4, "batch_size": 2,

"exp_splits": [100,80,60,40,20,10,5], "lr": 1e-3, "epochs_initialized": 400, "epochs_frozen": 0, "epochs_random": 0, "epochs_warmup": 25, "repetitions": 1,

"clipnorm": 1, "clipvalue": 1 }

I received the same error as previously. How do I solve this error?

aihamtaleb commented 2 years ago

Hi @lokeycookie

Please try changing the config file to the following:

{
"algorithm": "rotation",
"data_dir_train": "/hpctmp/e0310071/BRATS_data_128/train/BRATS_train",
"data_dir_test": "/hpctmp/e0310071/BRATS_data_128/test/BRATS_test",
....
}

I also think that the contents of the test directory should be changed to: +test --+BRATS_test ----+BRATS_437.npy ----+... ----+BRATS_484.npy --+BRATS_test_labels ----+BRATS_437_label.npy ----+... ----+BRATS_484_label.npy

lokeycookie commented 2 years ago

Hi @aihamtaleb ,

Thank you for your help! I have tried your suggestions and the code does not have this assertion error now. However, I have experienced another error but I will create another issue for it.

Thank you once again for your time and effort!