HealthML / self-supervised-3d-tasks

Apache License 2.0
187 stars 40 forks source link

(Finetuning Stage) tensorflow.python.framework.errors_impl.InvalidArgumentError: Can not squeeze dim[5], expected a dimension of 1, got 2 #24

Closed lokeycookie closed 2 years ago

lokeycookie commented 2 years ago

Hello,

I have recently tried to run the finetuning stage of rotation_3d algorithm on my BRATS dataset. However, I encountered this error:

Traceback (most recent call last): File "finetune.py", line 4, in main() File "/home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/finetune.py", line 418, in main init(run_complex_test, "test") File "/home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/utils/model_utils.py", line 67, in init f(args) File "/home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/finetune.py", line 354, in run_complex_test lambda: run_single_test(algorithm_def, gen_train, gen_val, True, False, x_test, y_test, lr, File "/home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/finetune.py", line 247, in try_until_no_nan return func() File "/home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/finetune.py", line 357, in logging_b_path, kwargs, clipnorm=clipnorm, clipvalue=clipvalue)) File "/home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/finetune.py", line 174, in run_single_test callbacks=w_callbacks, File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 819, in fit use_multiprocessing=use_multiprocessing) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 342, in fit total_epochs=epochs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 128, in run_one_epoch batch_outs = execution_function(iterator) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2_utils.py", line 98, in execution_function distributed_function(input_fn)) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py", line 568, in call result = self._call(*args, *kwds) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py", line 632, in _call return self._stateless_fn(args, kwds) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 2363, in call return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1611, in _filtered_call self.captured_inputs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1692, in _call_flat ctx, args, cancellation_manager=cancellation_manager)) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 545, in call ctx=ctx) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute six.raise_from(core._status_to_exception(e.code, message), None) File "", line 3, in raise_from tensorflow.python.framework.errors_impl.InvalidArgumentError: Can not squeeze dim[5], expected a dimension of 1, got 2 [[node loss/model_6_loss/remove_squeezable_dimensions/Squeeze (defined at /home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/finetune.py:174) ]] [Op:__inference_distributed_function_14995]

Function call stack: distributed_function

(some context) I am not sure if this context is relevant but I will still mention it. My BRATS image has dimension of (128,128,128,4), 4 channels for T1, T1-Gd, T2 and FLAIR. My BRATS labels has dimension of (128,128,128,4), 4 classes for background, edema, non-enhancing tumor and enhancing tumor.

After obtaining the above error, it appears that the model is not able to be trained.

I am unsure of the above error. Does anyone know whats the main issue and how to debug it? Really sorry for the trouble but appreciate it if anyone can help!

aihamtaleb commented 2 years ago

Hi @lokeycookie, Please provide more context. What is your config file? what is the loss function you're using? it might be related to this issue

lokeycookie commented 2 years ago

Hi @aihamtaleb , This is my config file for finetune.py

{ "algorithm": "rotation", "data_dir_train": "/hpctmp/e0310071/BRATS_data_128/train/BRATS_train", "data_dir_test": "/hpctmp/e0310071/BRATS_data_128/test/BRATS_test", "model_checkpoint": "/hpctmp/e0310071/saved_model/rotation_brats/weights-300.hdf5", "dataset_name": "brats", "train_data_generator_args": {"shuffle": true}, "val_data_generator_args": {"shuffle": false}, "test_data_generator_args": {"shuffle": false},

"data_is_3D": true, "val_split": 0.05,

"enc_filters": 16, "data_dim": 128,

"loss": "weighted_dice_loss", "scores": ["dice", "jaccard", "brats_wt", "brats_tc", "brats_et"], "metrics": ["accuracy", "weighted_dice_coefficient", "brats_metrics"],

"top_architecture": "big_fully", "prediction_architecture": "unet_3d_upconv", "pooling": "max", "number_channels": 4, "batch_size": 2,

"exp_splits": [5], "lr": 1e-3, "epochs_initialized": 300, "epochs_frozen": 0, "epochs_random": 0, "epochs_warmup": 25, "repetitions": 1,

"clipnorm": 1, "clipvalue": 1 }

The loss function is weighted_dice_loss. The following text file is the output of the code before the model stops training as it encountered the above error. stdout_4688047.txt

As for the dimension size of the BRATS dataset, BRATS image (.npy) has dimension of (128,128,128,4), 4 channels for T1, T1-Gd, T2 and FLAIR. My BRATS labels (.npy) has dimension of (128,128,128,4), 4 classes for background, edema, non-enhancing tumor and enhancing tumor.

How should I debug this issue?

aihamtaleb commented 2 years ago

I'm really unsure, except for trying out to use the labels not in one-hot. Please try that. This means the labels will have the shape: (128,128,128)

lokeycookie commented 2 years ago

Hi @aihamtaleb ,

I have tried using the labels not in one-hot. After that, I did the following two changes.

  1. In line 107 of this file https://github.com/HealthML/self-supervised-3d-tasks/blob/master/self_supervised_3d_tasks/data/segmentation_task_loader.py, I changed the patch size to (32,32,32) and patches_per_scan to be 64.
  2. In line 181 of this file https://github.com/HealthML/self-supervised-3d-tasks/blob/master/self_supervised_3d_tasks/data/segmentation_task_loader.py, I changes n_classes to be 4 as initially I recieved a ValueError.

However, after changing the labels to be only (128,128,128), I recieved the following error.

Traceback (most recent call last): File "finetune.py", line 4, in main() File "/home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/finetune.py", line 418, in main init(run_complex_test, "test") File "/home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/utils/model_utils.py", line 67, in init f(args) File "/home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/finetune.py", line 354, in run_complex_test lambda: run_single_test(algorithm_def, gen_train, gen_val, True, False, x_test, y_test, lr, File "/home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/finetune.py", line 247, in try_until_no_nan return func() File "/home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/finetune.py", line 357, in logging_b_path, kwargs, clipnorm=clipnorm, clipvalue=clipvalue)) File "/home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/finetune.py", line 174, in run_single_test callbacks=w_callbacks, File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 819, in fit use_multiprocessing=use_multiprocessing) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 342, in fit total_epochs=epochs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 128, in run_one_epoch batch_outs = execution_function(iterator) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2_utils.py", line 98, in execution_function distributed_function(input_fn)) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py", line 568, in call result = self._call(*args, *kwds) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py", line 632, in _call return self._stateless_fn(args, kwds) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 2363, in call return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1611, in _filtered_call self.captured_inputs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1692, in _call_flat ctx, args, cancellation_manager=cancellation_manager)) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 545, in call ctx=ctx) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute six.raise_from(core._status_to_exception(e.code, message), None) File "", line 3, in raise_from tensorflow.python.framework.errors_impl.InvalidArgumentError: Number of ways to split should evenly divide the split dimension, but got split_dim 1 (size = 1) and num_split 4 [[node model_5/model_4/up_sampling3d/split (defined at /home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/finetune.py:174) ]] [Op:__inference_distributed_function_11489]

Function call stack: distributed_function

What else can I try to solve this issue?

lokeycookie commented 2 years ago

Hi @aihamtaleb ,

Recently, I tried to use the labels in one-hot again, so the labels have a dimension of (128,128,128,4). I realise that in https://github.com/HealthML/self-supervised-3d-tasks/blob/master/self_supervised_3d_tasks/data/segmentation_task_loader.py file, from line 187 onwards, there is these few lines of code:

    n_classes = np.max(data_y) + 1
    data_y = np.eye(n_classes)[data_y]
    if data_y.shape[-2] == 1:
        data_y = np.squeeze(data_y, axis=-2)  # remove second last axis, which is still 1

    return data_x, data_y

Thus, the numpy array from labels got stacked twice, making it a 5d-array. When I comment out line 188 (data_y = np.eye(n_classes)[data_y]), the class PatchSegmentationGenerator3D returns a 4d-array. Also, I do not have the above error of tensorflow.python.framework.errors_impl.InvalidArgumentError: Can not squeeze dim[5], expected a dimension of 1, got 2.

However, I obtained the following error.

Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/op_def_library.py", line 468, in _apply_op_helper preferred_dtype=default_dtype) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py", line 1290, in convert_to_tensor (dtype.name, value.dtype.name, value)) ValueError: Tensor conversion requested dtype int64 for Tensor with dtype float32: <tf.Tensor 'model_5/model_4/model_3/conv3d_18/truediv:0' shape=(None, 128, 128, 128, 4) dtype=float32>

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "finetune.py", line 4, in main() File "/home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/finetune.py", line 418, in main init(run_complex_test, "test") File "/home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/utils/model_utils.py", line 67, in init f(args) File "/home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/finetune.py", line 354, in run_complex_test lambda: run_single_test(algorithm_def, gen_train, gen_val, True, False, x_test, y_test, lr, File "/home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/finetune.py", line 247, in try_until_no_nan return func() File "/home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/finetune.py", line 357, in logging_b_path, kwargs, clipnorm=clipnorm, clipvalue=clipvalue)) File "/home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/finetune.py", line 174, in run_single_test callbacks=w_callbacks, File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training.py", line 819, in fit use_multiprocessing=use_multiprocessing) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 342, in fit total_epochs=epochs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2.py", line 128, in run_one_epoch batch_outs = execution_function(iterator) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2_utils.py", line 98, in execution_function distributed_function(input_fn)) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py", line 568, in call result = self._call(*args, *kwds) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py", line 615, in _call self._initialize(args, kwds, add_initializers_to=initializers) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py", line 497, in _initialize args, kwds)) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 2389, in _get_concrete_function_internal_garbage_collected graphfunction, , _ = self._maybe_define_function(args, kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 2703, in _maybe_define_function graph_function = self._create_graph_function(args, kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 2593, in _create_graph_function capture_by_value=self._capture_by_value), File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/func_graph.py", line 978, in func_graph_from_py_func func_outputs = python_func(*func_args, func_kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/def_function.py", line 439, in wrapped_fn return weak_wrapped_fn().wrapped(*args, *kwds) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2_utils.py", line 85, in distributed_function per_replica_function, args=args) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/distribute_lib.py", line 763, in experimental_run_v2 return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/distribute_lib.py", line 1819, in call_for_each_replica return self._call_for_each_replica(fn, args, kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/distribute_lib.py", line 2164, in _call_for_each_replica return fn(args, kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/impl/api.py", line 292, in wrapper return func(*args, kwargs) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_v2_utils.py", line 433, in train_on_batch output_loss_metrics=model._output_loss_metrics) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_eager.py", line 312, in train_on_batch output_loss_metrics=output_loss_metrics)) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_eager.py", line 253, in _process_single_batch training=training)) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/training_eager.py", line 167, in _model_loss per_sample_losses = loss_fn.call(targets[i], outs[i]) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/losses.py", line 221, in call return self.fn(y_true, y_pred, self._fn_kwargs) File "/home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/utils/metrics.py", line 68, in weighted_dice_coefficient_loss return -weighted_dice_coefficient(y_true, y_pred) File "/home/svu/e0310071/self-supervised-3d-tasks/self_supervised_3d_tasks/utils/metrics.py", line 62, in weighted_dice_coefficient axis=axis) + smooth / 2) / (K.sum(y_true, File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/math_ops.py", line 902, in binary_op_wrapper return func(x, y, name=name) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/math_ops.py", line 1201, in _mul_dispatch return gen_math_ops.mul(x, y, name=name) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gen_math_ops.py", line 6125, in mul "Mul", x=x, y=y, name=name) File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/op_def_library.py", line 504, in _apply_op_helper inferred_from[input_arg.type_attr])) TypeError: Input 'y' of 'Mul' Op has type float32 that does not match type int64 of argument 'x'.

I have also attached a textfile of the error here just in case. HealthML_TypeError.txt

How do I resolve the above error? Currently, I am trying labels with (128,128,128,4) and its in one-hot encoding. Please feel free to ask me for more context if required. Any ideas on how to resolve this error?

lokeycookie commented 2 years ago

Hi @aihamtaleb ,

So recently, I tried with the labels not in one-hot, meaning the labels dimension are (128,128,128). Afterwards, I make the following changes.

  1. In line 170 of https://github.com/HealthML/self-supervised-3d-tasks/blob/master/self_supervised_3d_tasks/data/segmentation_task_loader.py, I changed the line to this: origin_row = np.random.randint(0, 1, self.patches_per_scan) origin_col = np.random.randint(0, 1, self.patches_per_scan) origin_dep = np.random.randint(0, 1, self.patches_per_scan)

The patch_size is still at (128,128,128) and patches_per_scan is 3.

  1. In line 187 of https://github.com/HealthML/self-supervised-3d-tasks/blob/master/self_supervised_3d_tasks/data/segmentation_task_loader.py, I changed the line to this: n_classes = 4 I hardcode the number of classes to be 4 since my BRATS dataset labels all have 4 classes.

After the following changes, I am able to run finish the finetuning code for all 5 self-supervised algorithms. I am not exactly sure why the above following changes work for my case. Nonetheless, thank you for your help!

aihamtaleb commented 2 years ago

I will respond to this on the other open issue.