Debugging (Patients 245 baseline and 684 w024) - RuntimeError: Given groups=1, weight of size [32, 1, 3, 3, 3], expected input[4, 60, 128, 128, 64] to have 1 channels, but got 60 channels instead

phiphi0815 commented 1 year ago

HPC Cluster script /SAN/medic/mspinpoint/clusterscript_mindglide_ORATORIO_10245.sh:

      #$ -S /bin/bash
      #$ -cwd
      #$ -l h_rt=06:00:00
      #$ -N protocoll_complete_single_245_Pat_mindGlide_ORATORI
      #$ -V
      #$ -l tmem=4G
      #$ -l tscratch=4G
      #$ -l gpu=true

    # print current working directory and hostname
    pwd; hostname
    date

    # read the directory to process from a file
    dir="ORATORIO/ORATORIO_10245"
    bn="$(basename "$dir")"

    # iterate over all the files in the directory
    for file in "$dir"/*
    do
        # create tmp directory for current file
        tmp_dir="tmp/${JOB_ID}_${SGE_TASK_ID}_$(basename "$file")"
        mkdir -p "$tmp_dir"
        echo "Created tmp directory: $tmp_dir"
        cp "$file" "$tmp_dir"
        echo "Processing singularity container for file: $file"
        current_file=$(basename "$file")
        # run the singularity container with the file as an argument
        singularity run --nv --bind "$tmp_dir":/mnt /SAN/medic/mspinpoint/mindGlide/container/mind-glide_latest.sif "$current_file"

        # create output directory including name of the input directory and the input file
        output_dir="${bn}_${current_file}"
        echo "Creating output directory: $output_dir in ORATORIO/output/"
        mkdir -p ORATORIO/output/ORATORIO_10245/"$output_dir"

        # copy results to output directory
        cp -r "$tmp_dir"/* ORATORIO/output/ORATORIO_10245/"$output_dir"
        echo "Successfully copied results to output directory: $output_dir"
        rm -rf "$tmp_dir"
        done
    date

Output file:

  GPU Prolog Script v1.14
  This is a GPU node.
  Enough GPUs available.
  Allocating card 0
  /SAN/medic/mspinpoint
  dip-207-1.local
  Tue  2 May 10:54:41 BST 2023
  Created tmp directory: tmp/9892259_undefined_20110211-baseline_flair.nii.gz
  Processing singularity container for file: ORATORIO/ORATORIO_10245/20110211-baseline_flair.nii.gz
  python /opt/mindGlide/mindGlide/run_inference.py --model_file_paths /opt/mindGlide/models/model_0_net_key_metric=0.7627.pt /opt/mindGlide/models/model_2_net_key_metric=0.7541.pt /opt/mindGlide/models/model_2_net_key_metric=0.7579.pt /opt/mindGlide/models/model_3_net_key_metric=0.7713.pt /opt/mindGlide/models/model_3_net_key_metric=0.7717.pt /opt/mindGlide/models/model_4_net_key_metric=0.7645.pt /opt/mindGlide/models/model_5_net_key_metric=0.7866.pt /opt/mindGlide/models/model_6_net_key_metric=0.7723.pt /opt/mindGlide/models/model_7_net_key_metric=0.7634.pt /opt/mindGlide/models/model_8_net_key_metric=0.7489.pt /opt/mindGlide/models/model_9_net_key_metric=0.7738.pt --scan_path 20110211-baseline_flair.nii.gz
  /mnt folder content:  ['20110211-baseline_flair.nii.gz']
  model_file_paths: ['/opt/mindGlide/models/model_0_net_key_metric=0.7627.pt', '/opt/mindGlide/models/model_2_net_key_metric=0.7541.pt', '/opt/mindGlide/models/model_2_net_key_metric=0.7579.pt', '/opt/mindGlide/models/model_3_net_key_metric=0.7713.pt', '/opt/mindGlide/models/model_3_net_key_metric=0.7717.pt', '/opt/mindGlide/models/model_4_net_key_metric=0.7645.pt', '/opt/mindGlide/models/model_5_net_key_metric=0.7866.pt', '/opt/mindGlide/models/model_6_net_key_metric=0.7723.pt', '/opt/mindGlide/models/model_7_net_key_metric=0.7634.pt', '/opt/mindGlide/models/model_8_net_key_metric=0.7489.pt', '/opt/mindGlide/models/model_9_net_key_metric=0.7738.pt']
  model_paths:  ['/opt/mindGlide/models/model_0_net_key_metric=0.7627.pt', '/opt/mindGlide/models/model_2_net_key_metric=0.7541.pt', '/opt/mindGlide/models/model_2_net_key_metric=0.7579.pt', '/opt/mindGlide/models/model_3_net_key_metric=0.7713.pt', '/opt/mindGlide/models/model_3_net_key_metric=0.7717.pt', '/opt/mindGlide/models/model_4_net_key_metric=0.7645.pt', '/opt/mindGlide/models/model_5_net_key_metric=0.7866.pt', '/opt/mindGlide/models/model_6_net_key_metric=0.7723.pt', '/opt/mindGlide/models/model_7_net_key_metric=0.7634.pt', '/opt/mindGlide/models/model_8_net_key_metric=0.7489.pt', '/opt/mindGlide/models/model_9_net_key_metric=0.7738.pt']
  scan to segment:  /mnt/20110211-baseline_flair.nii.gz
  ensemble inference with  11  models
  python  /opt/monai-tutorials/modules/dynunet_pipeline//inference.py -fold 0 -expr_name _mindglide -task_id 12 -tta_val False --root_dir /mnt/tmpMINDGLIDEIQvGvsEBba --datalist_path /mnt/tmpMINDGLIDEIQvGvsEBba --checkpoint /opt/mindGlide/models/model_0_net_key_metric=0.7627.pt
  Output: pretrained checkpoint: /opt/mindGlide/models/model_0_net_key_metric=0.7627.pt loaded
  2023-05-02 10:55:12,127 - Engine run resuming from iteration 0, epoch 0 until 1 epochs
  2023-05-02 10:55:17,225 - Current run is terminating due to exception: Given groups=1, weight of size [32, 1, 3, 3, 3], expected input[4, 60, 128, 128, 64] to have 1 channels, but got 60 channels instead
  2023-05-02 10:55:17,267 - Engine run is terminating due to exception: Given groups=1, weight of size [32, 1, 3, 3, 3], expected input[4, 60, 128, 128, 64] to have 1 channels, but got 60 channels instead

  Error:
  Loading dataset:   0%|          | 0/1 [00:00<?, ?it/s]/opt/monai/monai/data/utils.py:771: UserWarning: Modifying image pixdim from [ 0.9766  0.9766  3.     10.    ] to [  0.97659999   0.97659997   2.99999999 181.51818068]
    warnings.warn(f"Modifying image pixdim from {pixdim} to {norm}")

  Loading dataset: 100%|██████████| 1/1 [00:00<00:00,  1.09it/s]
  Loading dataset: 100%|██████████| 1/1 [00:00<00:00,  1.09it/s]
  Traceback (most recent call last):
    File "/opt/monai-tutorials/modules/dynunet_pipeline//inference.py", line 204, in <module>
      inference(args)
    File "/opt/monai-tutorials/modules/dynunet_pipeline//inference.py", line 76, in inference
      inferrer.run()
    File "/opt/monai/monai/engines/evaluator.py", line 148, in run
      super().run()
    File "/opt/monai/monai/engines/workflow.py", line 278, in run
      super().run(data=self.data_loader, max_epochs=self.state.max_epochs)
    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 892, in run
      return self._internal_run()
    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 935, in _internal_run
      return next(self._internal_run_generator)
    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 993, in _internal_run_as_gen
      self._handle_exception(e)
    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 638, in _handle_exception
      raise e
    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 959, in _internal_run_as_gen
      epoch_time_taken += yield from self._run_once_on_dataset_as_gen()
    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 1087, in _run_once_on_dataset_as_gen
      self._handle_exception(e)
    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 638, in _handle_exception
      raise e
    File "/opt/conda/lib/python3.8/site-packages/ignite/engine/engine.py", line 1068, in _run_once_on_dataset_as_gen
      self.state.output = self._process_function(self, self.state.batch)
    File "/opt/monai-tutorials/modules/dynunet_pipeline/inferrer.py", line 127, in _iteration
      predictions = _compute_pred()
    File "/opt/monai-tutorials/modules/dynunet_pipeline/inferrer.py", line 106, in _compute_pred
      pred = self.inferer(inputs, self.network, *args, **kwargs).cpu()
    File "/opt/monai/monai/inferers/inferer.py", line 192, in __call__
      return sliding_window_inference(
    File "/opt/monai/monai/inferers/utils.py", line 180, in sliding_window_inference
      seg_prob_out = predictor(window_data, *args, **kwargs)  # batched patch segmentation
    File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
      return forward_call(*input, **kwargs)
    File "/opt/monai/monai/networks/nets/dynunet.py", line 268, in forward
      out = self.skip_layers(x)
    File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
      return forward_call(*input, **kwargs)
    File "/opt/monai/monai/networks/nets/dynunet.py", line 46, in forward
      downout = self.downsample(x)
    File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
      return forward_call(*input, **kwargs)
    File "/opt/monai/monai/networks/blocks/dynunet_block.py", line 169, in forward
      out = self.conv1(inp)
    File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
      return forward_call(*input, **kwargs)
    File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/container.py", line 141, in forward
      input = module(input)
    File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
      return forward_call(*input, **kwargs)
    File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 592, in forward
      return self._conv_forward(input, self.weight, self.bias)
    File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 587, in _conv_forward
      return F.conv3d(
  RuntimeError: Given groups=1, weight of size [32, 1, 3, 3, 3], expected input[4, 60, 128, 128, 64] to have 1 channels, but got 60 channels instead

  Creating output directory: ORATORIO_10245_20110211-baseline_flair.nii.gz in ORATORIO/output/
  Successfully copied results to output directory: ORATORIO_10245_20110211-baseline_flair.nii.gz

armaneshaghi commented 1 year ago

@phiphi0815 please provide the full command, including the container address, that you use to get this error so I can find the version / commit

phiphi0815 commented 1 year ago

Full command and script address provided. Formated with Markdown.

armaneshaghi commented 1 year ago

@phiphi0815 have you looked at the data?

armaneshaghi commented 1 year ago

Closing this issue because the problem is in the data. Data lacks enough slices in all direction and seems to have been corrupted during reconstruction.

MS-PINPOINT / mindGlide

Debugging (Patients 245 baseline and 684 w024) - RuntimeError: Given groups=1, weight of size [32, 1, 3, 3, 3], expected input[4, 60, 128, 128, 64] to have 1 channels, but got 60 channels instead #12