TractSeg bedpostX out of memoryy

dyhan316 commented 2 years ago

In the following bedpostX folder, I tried running TractSeg -i dyads1.nii.gz.

However, the following error was output :

BedpostX dyads detected. Will automatically combine dyads1+2[+3].
Loading weights from: /home/connectome/dyhan316/.tractseg/pretrained_weights_tract_segmentation_v3.npz
Traceback (most recent call last):
  File "/home/connectome/dyhan316/.conda/envs/TRACTSEG/bin/TractSeg", line 417, in <module>
    main()
  File "/home/connectome/dyhan316/.conda/envs/TRACTSEG/bin/TractSeg", line 334, in main
    unit_test=args.test)
  File "/home/connectome/dyhan316/.conda/envs/TRACTSEG/lib/python3.7/site-packages/tractseg/python_api.py", line 157, in run_tractseg
    model = BaseModel(Config, inference=True)
  File "/home/connectome/dyhan316/.conda/envs/TRACTSEG/lib/python3.7/site-packages/tractseg/models/base_model.py", line 78, in __init__
    net = self.net.to(self.device)
  File "/home/connectome/dyhan316/.conda/envs/TRACTSEG/lib/python3.7/site-packages/torch/nn/modules/module.py", line 907, in to
    return self._apply(convert)
  File "/home/connectome/dyhan316/.conda/envs/TRACTSEG/lib/python3.7/site-packages/torch/nn/modules/module.py", line 578, in _apply
    module._apply(fn)
  File "/home/connectome/dyhan316/.conda/envs/TRACTSEG/lib/python3.7/site-packages/torch/nn/modules/module.py", line 578, in _apply
    module._apply(fn)
  File "/home/connectome/dyhan316/.conda/envs/TRACTSEG/lib/python3.7/site-packages/torch/nn/modules/module.py", line 601, in _apply
    param_applied = fn(param)
  File "/home/connectome/dyhan316/.conda/envs/TRACTSEG/lib/python3.7/site-packages/torch/nn/modules/module.py", line 905, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

I looked at nvidia-smi to check if it really is a memory issue, but as the picture below shows, there are plenty of idle gpus... Could anyone explain how I can resolve this? Thank you!

wasserth commented 2 years ago

Does tractseg for normal input (no bedpostX) work without any memory errors?

dyhan316 commented 2 years ago

Thank you for the reply! When done with just preprocessed dwi data, I get,

Creating brain mask...
Warning: An input intended to be a single 3D volume has multiple timepoints. Input will be truncated to first volume, but this functionality is deprecated and will be removed in a future release.
Creating peaks (1 of 3)...
sh: 1: dwi2response: not found
Creating peaks (2 of 3)...
sh: 1: dwi2fod: not found
Creating peaks (3 of 3)...
sh: 1: sh2peaks: not found
Traceback (most recent call last):
  File "/home/connectome/dyhan316/.conda/envs/TRACTSEG/lib/python3.7/site-packages/nibabel/loadsave.py", line 42, in load
    stat_result = os.stat(filename)
FileNotFoundError: [Errno 2] No such file or directory: 'tractseg_output/peaks.nii.gz'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/connectome/dyhan316/.conda/envs/TRACTSEG/bin/TractSeg", line 417, in <module>
    main()
  File "/home/connectome/dyhan316/.conda/envs/TRACTSEG/bin/TractSeg", line 255, in main
    data_img = nib.load(peak_path)
  File "/home/connectome/dyhan316/.conda/envs/TRACTSEG/lib/python3.7/site-packages/nibabel/loadsave.py", line 44, in load
    raise FileNotFoundError(f"No such file or no access: '{filename}'")
FileNotFoundError: No such file or no access: 'tractseg_output/peaks.nii.gz'

dyhan316 commented 2 years ago

(the resulting folder was the following : )

wasserth commented 2 years ago

You have to install mrtrix to make it work (you need for example the command "dwi2response"). Mrtrix is used to calculate the peaks.

dyhan316 commented 2 years ago

Thank you for the response!

After installing mrtrix,when I run it again, it does create the peaks.nii.gz file. However, the following error (again, related to cuda I suppose?) arises..

Creating brain mask...
Warning: An input intended to be a single 3D volume has multiple timepoints. Input will be truncated to first volume, but this functionality is deprecated and will be removed in a future release.
Creating peaks (1 of 3)...
Creating peaks (2 of 3)...
Creating peaks (3 of 3)...
Reorienting data...
Loading weights from: /home/connectome/dyhan316/.tractseg/pretrained_weights_tract_segmentation_v3.npz
Processing direction (1 of 3)
  0%|                                                                                                                                                    | 0/144 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/home/connectome/dyhan316/.conda/envs/TRACTSEG/bin/TractSeg", line 417, in <module>
    main()
  File "/home/connectome/dyhan316/.conda/envs/TRACTSEG/bin/TractSeg", line 334, in main
    unit_test=args.test)
  File "/home/connectome/dyhan316/.conda/envs/TRACTSEG/lib/python3.7/site-packages/tractseg/python_api.py", line 172, in run_tractseg
    batch_size=inference_batch_size)
  File "/home/connectome/dyhan316/.conda/envs/TRACTSEG/lib/python3.7/site-packages/tractseg/libs/direction_merger.py", line 30, in get_seg_single_img_3_directions
    batch_size=batch_size)    # (x, y, z, nr_classes)
  File "/home/connectome/dyhan316/.conda/envs/TRACTSEG/lib/python3.7/site-packages/tractseg/libs/trainer.py", line 298, in predict_img
    layer_probs = model.predict(x)  # (bs, x, y, nr_classes)
  File "/home/connectome/dyhan316/.conda/envs/TRACTSEG/lib/python3.7/site-packages/tractseg/models/base_model.py", line 239, in predict
    outputs = self.net(X)  # forward
  File "/home/connectome/dyhan316/.conda/envs/TRACTSEG/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/connectome/dyhan316/.conda/envs/TRACTSEG/lib/python3.7/site-packages/tractseg/models/unet_pytorch_deepsup.py", line 73, in forward
    contr_1_1 = self.contr_1_1(inpt)
  File "/home/connectome/dyhan316/.conda/envs/TRACTSEG/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/connectome/dyhan316/.conda/envs/TRACTSEG/lib/python3.7/site-packages/torch/nn/modules/container.py", line 141, in forward
    input = module(input)
  File "/home/connectome/dyhan316/.conda/envs/TRACTSEG/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/connectome/dyhan316/.conda/envs/TRACTSEG/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 447, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/home/connectome/dyhan316/.conda/envs/TRACTSEG/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 444, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

MIC-DKFZ / TractSeg

TractSeg bedpostX out of memoryy #184