Closed silvandeleemput closed 11 months ago
Thank you @silvandeleemput for creating the issue.
We get the following error (if # GPUs on host machine > 1):
--- Logging error ---
Traceback (most recent call last):
File "/usr/lib/python3.8/logging/__init__.py", line 1085, in emit
msg = self.format(record)
File "/usr/lib/python3.8/logging/__init__.py", line 929, in format
return fmt.format(record)
File "/usr/lib/python3.8/logging/__init__.py", line 668, in format
record.message = record.getMessage()
File "/usr/lib/python3.8/logging/__init__.py", line 373, in getMessage
msg = msg % self.args
TypeError: not all arguments converted during string formatting
Call stack:
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.8/dist-packages/mhubio/run.py", line 424, in <module>
run(config_file)
File "/usr/local/lib/python3.8/dist-packages/mhubio/run.py", line 365, in run
module(
File "/usr/local/lib/python3.8/dist-packages/mhubio/core/Module.py", line 77, in execute
self.task()
File "/usr/local/lib/python3.8/dist-packages/mhubio/core/IO.py", line 186, in wrapper
func(self, instance, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/mhubio/core/IO.py", line 213, in wrapper
func(self, instance, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/mhubio/core/IO.py", line 300, in wrapper
func(self, instance, *args, **kwargs)
File "/app/models/gc_lunglobes/utils/LobeSegmentationRunner.py", line 50, in task
handle = segment_lobe_init()
File "/app/src/test.py", line 1830, in segment_lobe_init
lobe_seg_instance = LobeSegmentationTSTestCOVID(settings)
File "/app/src/test.py", line 1524, in __init__
self.init()
File "/app/src/test.py", line 700, in init
self.logger.info("Let's use", torch.cuda.device_count(), "GPUs!")
Message: "Let's use"
Arguments: (2, 'GPUs!')
The error can be reproduced using the stable MHub release v1:
docker run --rm -it --gpus all -v /absolute/path/to/dicom/data/:/app/data/input_data:ro mhubai/gc_lunglobes:v1 --workflow default --print
Note, that for demonstration purposes, we only need to map an input directory into the container.
To review the generated output, an output directory can be specified by adding -v /absolute/path/to/output/folder:/app/data/output_data
before the image specification (mhubai/gc_lunglobes:v1
).
Subsequently, if you remove the logger line you get the following error:
ERROR: LobeSegmentationRunner failed processing instance <I:/app/data/sorted_data/1.3.6.1.4.1.14519.5.2.1.6279.6001.179049373636438705059720603192>: 'DataParallel' object has no attribute 'scan_level_inference' in Traceback (most recent
call last):
File "/usr/local/lib/python3.8/dist-packages/mhubio/core/IO.py", line 186, in wrapper
func(self, instance, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/mhubio/core/IO.py", line 213, in wrapper
func(self, instance, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/mhubio/core/IO.py", line 300, in wrapper
func(self, instance, *args, **kwargs)
File "/app/models/gc_lunglobes/utils/LobeSegmentationRunner.py", line 51, in task
seg_result_np = segment_lobe(handle, img_np, meta_dict)
File "/app/src/test.py", line 1861, in segment_lobe
pred = handle.run(transformed_data_dict)
File "/app/src/test.py", line 1572, in run
scan_level_inf = self.model.scan_level_inference(pad_scan).cpu().squeeze(0)
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1614, in __getattr__
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'DataParallel' object has no attribute 'scan_level_inference'
@LennyN95 The bug has been resolved in the original repository under a new release, shall I add this update under #42 or shall I make a new PR?
Is it an entirely new model or an updated version of the original one? Do the ModelCard details still apply to the new release (training, testing, evaluation, ..) and does the model meet our general requirements (Licence, maintenance, ..)? If yes, updating here is fine.
It is just a bug fix for the multi-GPU support. The model hasn't changed. Everything should be the same. So I'll update it under #42.
--gpus all
option.Originally posted by @LennyN95 in https://github.com/MHubAI/models/issues/42#issuecomment-1824579306
The issue
When running the MHub docker container with the lobe segmentation code from this repository with the
--gpus all
flags enabled and having 2 or more GPUs we run into the following error:This error appears to be related to incorrect formatting of the logger.info call. Furthermore, if the logging line is removed we get the following error:
Upon inspecting the latter issue it appears that getting the multi-GPU feature to work properly isn't as simple as wrapping the model with
torch.nn.DataParallel
because the wrapped model uses custom methods (i.e.scan_level_inference
) for inference, which are not picked up by the DataParallel mechanism of PyTorch.Suggested fix
As fixing the multi-GPU feature properly would be quite some work, the broken multi-GPU feature could be disabled entirely by removal of the following lines:
https://github.com/DIAGNijmegen/bodyct-pulmonary-lobe-segmentation/blob/5a64b70504d46c042c30851a69cec370f1202e67/test.py#L699-L702