Closed lokeycookie closed 2 years ago
Hi @lokeycookie, Please provide more context. What is your config file? what is the loss function you're using? it might be related to this issue
Hi @aihamtaleb , This is my config file for finetune.py
{ "algorithm": "rotation", "data_dir_train": "/hpctmp/e0310071/BRATS_data_128/train/BRATS_train", "data_dir_test": "/hpctmp/e0310071/BRATS_data_128/test/BRATS_test", "model_checkpoint": "/hpctmp/e0310071/saved_model/rotation_brats/weights-300.hdf5", "dataset_name": "brats", "train_data_generator_args": {"shuffle": true}, "val_data_generator_args": {"shuffle": false}, "test_data_generator_args": {"shuffle": false},
"data_is_3D": true, "val_split": 0.05,
"enc_filters": 16, "data_dim": 128,
"loss": "weighted_dice_loss", "scores": ["dice", "jaccard", "brats_wt", "brats_tc", "brats_et"], "metrics": ["accuracy", "weighted_dice_coefficient", "brats_metrics"],
"top_architecture": "big_fully", "prediction_architecture": "unet_3d_upconv", "pooling": "max", "number_channels": 4, "batch_size": 2,
"exp_splits": [5], "lr": 1e-3, "epochs_initialized": 300, "epochs_frozen": 0, "epochs_random": 0, "epochs_warmup": 25, "repetitions": 1,
"clipnorm": 1, "clipvalue": 1 }
The loss function is weighted_dice_loss. The following text file is the output of the code before the model stops training as it encountered the above error. stdout_4688047.txt
As for the dimension size of the BRATS dataset, BRATS image (.npy) has dimension of (128,128,128,4), 4 channels for T1, T1-Gd, T2 and FLAIR. My BRATS labels (.npy) has dimension of (128,128,128,4), 4 classes for background, edema, non-enhancing tumor and enhancing tumor.
How should I debug this issue?
I'm really unsure, except for trying out to use the labels not in one-hot. Please try that. This means the labels will have the shape: (128,128,128)
Hi @aihamtaleb ,
I have tried using the labels not in one-hot. After that, I did the following two changes.
However, after changing the labels to be only (128,128,128), I recieved the following error.
Traceback (most recent call last):
File "finetune.py", line 4, in
Function call stack: distributed_function
What else can I try to solve this issue?
Hi @aihamtaleb ,
Recently, I tried to use the labels in one-hot again, so the labels have a dimension of (128,128,128,4). I realise that in https://github.com/HealthML/self-supervised-3d-tasks/blob/master/self_supervised_3d_tasks/data/segmentation_task_loader.py file, from line 187 onwards, there is these few lines of code:
n_classes = np.max(data_y) + 1
data_y = np.eye(n_classes)[data_y]
if data_y.shape[-2] == 1:
data_y = np.squeeze(data_y, axis=-2) # remove second last axis, which is still 1
return data_x, data_y
Thus, the numpy array from labels got stacked twice, making it a 5d-array. When I comment out line 188 (data_y = np.eye(n_classes)[data_y]), the class PatchSegmentationGenerator3D returns a 4d-array. Also, I do not have the above error of tensorflow.python.framework.errors_impl.InvalidArgumentError: Can not squeeze dim[5], expected a dimension of 1, got 2.
However, I obtained the following error.
Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/op_def_library.py", line 468, in _apply_op_helper preferred_dtype=default_dtype) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1290, in convert_to_tensor (dtype.name, value.dtype.name, value)) ValueError: Tensor conversion requested dtype int64 for Tensor with dtype float32: <tf.Tensor 'model_5/model_4/model_3/conv3d_18/truediv:0' shape=(None, 128, 128, 128, 4) dtype=float32>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "finetune.py", line 4, in
I have also attached a textfile of the error here just in case. HealthML_TypeError.txt
How do I resolve the above error? Currently, I am trying labels with (128,128,128,4) and its in one-hot encoding. Please feel free to ask me for more context if required. Any ideas on how to resolve this error?
Hi @aihamtaleb ,
So recently, I tried with the labels not in one-hot, meaning the labels dimension are (128,128,128). Afterwards, I make the following changes.
The patch_size is still at (128,128,128) and patches_per_scan is 3.
After the following changes, I am able to run finish the finetuning code for all 5 self-supervised algorithms. I am not exactly sure why the above following changes work for my case. Nonetheless, thank you for your help!
I will respond to this on the other open issue.
Hello,
I have recently tried to run the finetuning stage of rotation_3d algorithm on my BRATS dataset. However, I encountered this error:
Traceback (most recent call last): File "finetune.py", line 4, in
main()
File "/home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/finetune.py", line 418, in main
init(run_complex_test, "test")
File "/home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/utils/model_utils.py", line 67, in init
f(args)
File "/home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/finetune.py", line 354, in run_complex_test
lambda: run_single_test(algorithm_def, gen_train, gen_val, True, False, x_test, y_test, lr,
File "/home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/finetune.py", line 247, in try_until_no_nan
return func()
File "/home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/finetune.py", line 357, in
logging_b_path, kwargs, clipnorm=clipnorm, clipvalue=clipvalue))
File "/home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/finetune.py", line 174, in run_single_test
callbacks=w_callbacks,
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 819, in fit
use_multiprocessing=use_multiprocessing)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 342, in fit
total_epochs=epochs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 128, in run_one_epoch
batch_outs = execution_function(iterator)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2_utils.py", line 98, in execution_function
distributed_function(input_fn))
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py", line 568, in call
result = self._call(*args, *kwds)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py", line 632, in _call
return self._stateless_fn(args, kwds)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 2363, in call
return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1611, in _filtered_call
self.captured_inputs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1692, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 545, in call
ctx=ctx)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: Can not squeeze dim[5], expected a dimension of 1, got 2
[[node loss/model_6_loss/remove_squeezable_dimensions/Squeeze (defined at /home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/finetune.py:174) ]] [Op:__inference_distributed_function_14995]
Function call stack: distributed_function
(some context) I am not sure if this context is relevant but I will still mention it. My BRATS image has dimension of (128,128,128,4), 4 channels for T1, T1-Gd, T2 and FLAIR. My BRATS labels has dimension of (128,128,128,4), 4 classes for background, edema, non-enhancing tumor and enhancing tumor.
After obtaining the above error, it appears that the model is not able to be trained.
I am unsure of the above error. Does anyone know whats the main issue and how to debug it? Really sorry for the trouble but appreciate it if anyone can help!