ConvTranspose 3D is not supported on MPS

frxderic commented 3 months ago

Hi together,

I have successfully trained a version of nnunet on the BraTS dataset on a V100 GPU. Now after training, I was hoping to be able to use the model for inference on a MacBook with a M2 chip. The model loading works fine, but I get the error: RuntimeError: ConvTranspose 3D is not supported on MPS. After some research this is a known problem, with official support for ConvTranspose3D still pending for the mps backend. However, I wanted to ask if there is some sort of workaround available to get the model to run on mps? I tried: os.environ["PYTORCH_ENABLE_MPS_FALLBACK"] = "1" before importing torch, but is still got the error.

Apparently there is some PyTorch PR pending but I haven't been able to install it from source.

Any suggestions are greatly appreciated!

I have attached the complete error message below:

RuntimeError Traceback (most recent call last) Cell In[15], line 1 ----> 1 predictor.predict_from_files(input_folder, output_folder, save_probabilities=False, 2 overwrite=True, 3 num_processes_preprocessing=3, 4 num_processes_segmentation_export=3, 5 folder_with_segs_from_prev_stage=None, 6 num_parts=1, part_id=0)

File ~/Projects/segmentation/nnunetv2/inference/predict_from_raw_data.py:258, in nnUNetPredictor.predict_from_files(self, list_of_lists_or_source_folder, output_folder_or_list_of_truncated_output_files, save_probabilities, overwrite, num_processes_preprocessing, num_processes_segmentation_export, folder_with_segs_from_prev_stage, num_parts, part_id) 251 return 253 data_iterator = self._internal_get_data_iterator_from_lists_of_filenames(list_of_lists_or_source_folder, 254 seg_from_prev_stage_files, 255 output_filename_truncated, 256 num_processes_preprocessing) --> 258 return self.predict_from_data_iterator(data_iterator, save_probabilities, num_processes_segmentation_export)

File ~/Projects/segmentation/nnunetv2/inference/predict_from_raw_data.py:375, in nnUNetPredictor.predict_from_data_iterator(self, data_iterator, save_probabilities, num_processes_segmentation_export) 372 sleep(0.1) 373 proceed = not check_workers_alive_and_busy(export_pool, worker_list, r, allowed_num_queued=2) --> 375 prediction = self.predict_logits_from_preprocessed_data(data).cpu() 377 if ofile is not None: 378 # this needs to go into background processes 379 # export_prediction_from_logits(prediction, properties, self.configuration_manager, self.plans_manager, 380 # self.dataset_json, ofile, save_probabilities) 381 print('sending off prediction to background worker for resampling and export')

File ~/Projects/segmentation/nnunetv2/inference/predict_from_raw_data.py:492, in nnUNetPredictor.predict_logits_from_preprocessed_data(self, data) 488 # why not leave prediction on device if perform_everything_on_device? Because this may cause the 489 # second iteration to crash due to OOM. Grabbing that with try except cause way more bloated code than 490 # this actually saves computation time 491 if prediction is None: --> 492 prediction = self.predict_sliding_window_return_logits(data).to('cpu') 493 else: 494 prediction += self.predict_sliding_window_return_logits(data).to('cpu')

File ~/Projects/segmentation/nnunetv2/inference/predict_from_raw_data.py:653, in nnUNetPredictor.predict_sliding_window_return_logits(self, input_image) 651 predicted_logits = self._internal_predict_sliding_window_return_logits(data, slicers, False) 652 else: --> 653 predicted_logits = self._internal_predict_sliding_window_return_logits(data, slicers, 654 self.perform_everything_on_device) 656 empty_cache(self.device) 657 # revert padding

File ~/Projects/segmentation/nnunetv2/inference/predict_from_raw_data.py:609, in nnUNetPredictor._internal_predict_sliding_window_return_logits(self, data, slicers, do_on_device) 607 empty_cache(self.device) 608 empty_cache(results_device) --> 609 raise e 610 return predicted_logits

File ~/Projects/segmentation/nnunetv2/inference/predict_from_raw_data.py:592, in nnUNetPredictor._internal_predict_sliding_window_return_logits(self, data, slicers, do_on_device) 589 workon = data[sl][None] 590 workon = workon.to(self.device) --> 592 prediction = self._internal_maybe_mirror_and_predict(workon)[0].to(results_device) 594 if self.use_gaussian: 595 prediction *= gaussian

File ~/Projects/segmentation/nnunetv2/inference/predict_from_raw_data.py:539, in nnUNetPredictor._internal_maybe_mirror_and_predict(self, x) 537 def _internal_maybe_mirror_and_predict(self, x: torch.Tensor) -> torch.Tensor: 538 mirror_axes = self.allowed_mirroring_axes if self.use_mirroring else None --> 539 prediction = self.network(x) 541 if mirror_axes is not None: 542 # check for invalid numbers in mirror_axes 543 # x should be 5d for 3d images and 4d for 2d. so the max value of mirror_axes cannot exceed len(x.shape) - 3 544 assert max(mirror_axes) <= x.ndim - 3, 'mirror_axes does not match the dimension of the input!'

File ~/Projects/segmentation/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1736, in Module._wrapped_call_impl(self, *args, kwargs) 1734 return self._compiled_call_impl(*args, *kwargs) # type: ignore[misc] 1735 else: -> 1736 return self._call_impl(args, kwargs)

File ~/Projects/segmentation/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1747, in Module._call_impl(self, *args, *kwargs) 1742 # If we don't have any hooks, we want to skip the rest of the logic in 1743 # this function, and just call forward. 1744 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1745 or _global_backward_pre_hooks or _global_backward_hooks 1746 or _global_forward_hooks or _global_forward_pre_hooks): -> 1747 return forward_call(args, **kwargs) 1749 try: 1750 result = None

File ~/Projects/segmentation/.venv/lib/python3.11/site-packages/dynamic_network_architectures/architectures/unet.py:62, in PlainConvUNet.forward(self, x) 60 def forward(self, x): 61 skips = self.encoder(x) ---> 62 return self.decoder(skips)

File ~/Projects/segmentation/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1736, in Module._wrapped_call_impl(self, *args, kwargs) 1734 return self._compiled_call_impl(*args, *kwargs) # type: ignore[misc] 1735 else: -> 1736 return self._call_impl(args, kwargs)

File ~/Projects/segmentation/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1747, in Module._call_impl(self, *args, *kwargs) 1742 # If we don't have any hooks, we want to skip the rest of the logic in 1743 # this function, and just call forward. 1744 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1745 or _global_backward_pre_hooks or _global_backward_hooks 1746 or _global_forward_hooks or _global_forward_pre_hooks): -> 1747 return forward_call(args, **kwargs) 1749 try: 1750 result = None

File ~/Projects/segmentation/.venv/lib/python3.11/site-packages/dynamic_network_architectures/building_blocks/unet_decoder.py:109, in UNetDecoder.forward(self, skips) 107 seg_outputs = [] 108 for s in range(len(self.stages)): --> 109 x = self.transpconvss 110 x = torch.cat((x, skips[-(s+2)]), 1) 111 x = self.stagess

File ~/Projects/segmentation/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1736, in Module._wrapped_call_impl(self, *args, kwargs) 1734 return self._compiled_call_impl(*args, *kwargs) # type: ignore[misc] 1735 else: -> 1736 return self._call_impl(args, kwargs)

File ~/Projects/segmentation/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py:1747, in Module._call_impl(self, *args, *kwargs) 1742 # If we don't have any hooks, we want to skip the rest of the logic in 1743 # this function, and just call forward. 1744 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1745 or _global_backward_pre_hooks or _global_backward_hooks 1746 or _global_forward_hooks or _global_forward_pre_hooks): -> 1747 return forward_call(args, **kwargs) 1749 try: 1750 result = None

File ~/Projects/segmentation/.venv/lib/python3.11/site-packages/torch/nn/modules/conv.py:1333, in ConvTranspose3d.forward(self, input, output_size) 1322 num_spatial_dims = 3 1323 output_padding = self._output_padding( 1324 input, 1325 output_size, (...) 1330 self.dilation, # type: ignore[arg-type] 1331 ) -> 1333 return F.conv_transpose3d( 1334 input, 1335 self.weight, 1336 self.bias, 1337 self.stride, 1338 self.padding, 1339 output_padding, 1340 self.groups, 1341 self.dilation, 1342 )

RuntimeError: ConvTranspose 3D is not supported on MPS

valosekj commented 3 months ago

LalithShiyam commented 3 months ago

Hi @valosekj,

We encountered a similar issue with our nnUNetv2 inference package (https://github.com/ENHANCE-PET/MOOSE). Please refer to the installation guide for Mac M-series chips.

For context, we initially used a custom fork of PyTorch (https://github.com/pytorch/pytorch.git@3c61c525694eca0f895bb01fc67c16793226051a) with conv3d and transpose conv3d implementations. However, this stopped working after the NumPy 2.0 release. With assistance from GPT, we resolved the NumPy 2.0 issue and created a separate fork. You can find the fixed fork here: https://github.com/LalithShiyam/pytorch-mps.git. Once you install PyTorch using this fork, you should be able to use nnUNetv2 for inference without any issues.

Use the fork here to install PyTorch: pip install git+https://github.com/LalithShiyam/pytorch-mps.git
Run your nnUNetv2 model. Everything should work as expected.

Cheers,
Lalith

frxderic commented 3 months ago

Hey, thank you @LalithShiyam for the very quick response! Very impressive that you got your own PyTorch fork working with mps! I directly tried to get it to run, but when I try to install it, I get:

error: Failed to prepare distributions
  Caused by: Failed to fetch wheel: torch @ git+https://github.com/LalithShiyam/pytorch-mps.git@ffda73cfecffd15b50a85745e98d0641a5583b58
  Caused by: Failed to build: `torch @ git+https://github.com/LalithShiyam/pytorch-mps.git@ffda73cfecffd15b50a85745e98d0641a5583b58`
  Caused by: Build backend failed to build wheel through `build_wheel()` with exit status: 1
--- stdout:
Building wheel torch-2.3.0a0+gitffda73c
-- Building version 2.3.0a0+gitffda73c
cmake --build . --target install --config Release
--- stderr:
no such file or directory
CMake Error: Generator: execution of make failed. Make command was: ~/Library/Caches/uv/environments-v0/.tmpELsuQU/bin/ninja install
---

I am using a virtual environment via pip, python version 3.10 and running on an up-to-date MacBook Air with an M2 chip. Any tips on how to get your fork to install properly?

LalithShiyam commented 3 months ago

Hi @frxderic,

Many thanks - but just to be clear we didn't get the MPS-torch working, it was from the original fork as mentioned above, we just fixed the bug due to numpy2.0 update. I am doing a fresh install in my Mac (M1 ultra). Seems to be building fine so far. Will keep you posted.

Also do you have cmake and ninja installed?

LalithShiyam commented 3 months ago

[update] build successfully in my case. I think u need to check if cmake and ninja are installed, if not installed, use brew to do so.

frxderic commented 3 months ago

I managed to get it to build by using a new cmake version outside of my venv environment! Thank you for your help - I hope that future PyTorch versions will support 3D ConvTranspose without this workaround...

LalithShiyam commented 3 months ago

@frxderic Fantastic, did you use the git fork of PyTorch-MPS? After the cmake fix?

frxderic commented 3 months ago

@LalithShiyam yes exactly - just as you described in your comment. It now runs a prediction within 38 seconds using all 5 folds of the trained model!

LalithShiyam commented 3 months ago

@frxderic phew! Wonderful! Just wanted to confirm that the fork is functional 😃! Enjoy nnunet on MPS! It's pretty good!

sten2lu commented 3 months ago

Many thanks to the community for addressing this issue!

I will mark this issue as closed now.

Best regards,

Carsten

MIC-DKFZ / nnUNet

ConvTranspose 3D is not supported on MPS #2435