Project-MONAI / MONAI

AI Toolkit for Healthcare Imaging
https://monai.io/
Apache License 2.0
5.87k stars 1.09k forks source link

Auto3DSeg: How to Set up Multiple Phase Data in Datalist JSON File? #5592

Closed moonforsun closed 1 year ago

moonforsun commented 1 year ago

Describe the bug I was able to run the Auto3DSeg based on the Task04_Hippocampus example. However, I was encountered the following bug on a customized dataset which I want to have multiple phase images as the network input. Any suggestion to how to use multiple phase data configured in the datalist json file?

Environment

================================ Printing MONAI config...

MONAI version: 1.0.1 Numpy version: 1.23.5 Pytorch version: 1.13.0+cu117 MONAI flags: HAS_EXT = False, USE_COMPILED = False, USE_META_DICT = False MONAI rev id: 8271a193229fe4437026185e218d5b06f7c8ce69 MONAI file: /home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/monai/init.py

Optional dependencies: Pytorch Ignite version: 0.4.10 Nibabel version: 3.2.0 scikit-image version: 0.19.3 Pillow version: 9.3.0 Tensorboard version: 2.11.0 gdown version: 4.5.3 TorchVision version: 0.14.0+cu117 tqdm version: 4.64.1 lmdb version: 1.3.0 psutil version: 5.9.4 pandas version: 1.5.1 einops version: 0.6.0 transformers version: 4.21.3 mlflow version: 2.0.1 pynrrd version: 1.0.0

For details about installing the optional dependencies, please visit: https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies

================================ Printing system config...

System: Linux Linux version: Ubuntu 18.04.6 LTS Platform: Linux-5.4.0-1069-aws-x86_64-with-glibc2.27 Processor: x86_64 Machine: x86_64 Python version: 3.8.2 Process name: python Command: ['/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/bin/python', '-c', 'import monai; monai.config.print_debug_info()'] Open files: [] Num physical CPUs: 24 Num logical CPUs: 48 Num usable CPUs: 48 CPU usage (%): [3.3, 3.0, 3.0, 2.6, 3.3, 3.0, 3.3, 3.0, 3.3, 2.6, 3.3, 3.0, 3.3, 3.0, 3.3, 3.3, 3.0, 3.0, 3.0, 3.0, 3.3, 3.0, 3.0, 3.0, 3.0, 3.3, 3.3, 3.0, 3.3, 3.0, 3.0, 3.0, 3.6, 3.0, 3.0, 3.3, 3.0, 3.0, 3.3, 3.0, 3.3, 3.0, 3.0, 3.3, 3.3, 3.3, 3.3, 97.7] CPU freq. (MHz): 1959 Load avg. in last 1, 5, 15 mins (%): [0.4, 4.9, 8.0] Disk usage (%): 38.7 Avg. sensor temp. (Celsius): UNKNOWN for given OS Total physical memory (GB): 186.7 Available memory (GB): 184.2 Used memory (GB): 1.0

================================ Printing GPU config...

Num GPUs: 4 Has CUDA: True CUDA version: 11.7 cuDNN enabled: True cuDNN version: 8500 Current device: 0 Library compiled for CUDA architectures: ['sm_37', 'sm_50', 'sm_60', 'sm_70', 'sm_75', 'sm_80', 'sm_86'] GPU 0 Name: NVIDIA A10G GPU 0 Is integrated: False GPU 0 Is multi GPU board: False GPU 0 Multi processor count: 80 GPU 0 Total memory (GB): 22.2 GPU 0 CUDA capability (maj.min): 8.6 GPU 1 Name: NVIDIA A10G GPU 1 Is integrated: False GPU 1 Is multi GPU board: False GPU 1 Multi processor count: 80 GPU 1 Total memory (GB): 22.2 GPU 1 CUDA capability (maj.min): 8.6 GPU 2 Name: NVIDIA A10G GPU 2 Is integrated: False GPU 2 Is multi GPU board: False GPU 2 Multi processor count: 80 GPU 2 Total memory (GB): 22.2 GPU 2 CUDA capability (maj.min): 8.6 GPU 3 Name: NVIDIA A10G GPU 3 Is integrated: False GPU 3 Is multi GPU board: False GPU 3 Multi processor count: 80 GPU 3 Total memory (GB): 22.2 GPU 3 CUDA capability (maj.min): 8.6

Additional context Full trace

poetry run python -m monai.apps.auto3dseg AutoRunner run --input='./input.yaml' --work_dir='./LiverCrop' 2022-11-28 02:19:56,661 - INFO - ./LiverCrop does not exists. Creating... 2022-11-28 02:19:56,661 - INFO - ./LiverCrop created to save all results 2022-11-28 02:19:56,661 - INFO - Loading ./input.yaml for AutoRunner and making a copy in /home/ubuntu/Code/autoseg3d/LiverCrop/input.yaml 2022-11-28 02:19:56,663 - INFO - The output_dir is not specified. /home/ubuntu/Code/autoseg3d/LiverCrop/ensemble_output will be used to save ensemble predictions 2022-11-28 02:19:56,663 - INFO - Directory /home/ubuntu/Code/autoseg3d/LiverCrop/ensemble_output is created to save ensemble predictions 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 320/320 [06:45<00:00, 1.27s/it] algo_templates.tar.gz: 296kB [00:00, 712kB/s]
2022-11-28 02:26:47,886 - INFO - Downloaded: /tmp/tmpndcigyw9/algo_templates.tar.gz 2022-11-28 02:26:47,886 - INFO - Expected md5 is None, skip md5 check for file /tmp/tmpndcigyw9/algo_templates.tar.gz. 2022-11-28 02:26:47,886 - INFO - Writing into directory: /home/ubuntu/Code/autoseg3d/LiverCrop. 2022-11-28 02:26:55,159 - INFO - /home/ubuntu/Code/autoseg3d/LiverCrop/segresnet2d_0 2022-11-28 02:27:02,219 - INFO - /home/ubuntu/Code/autoseg3d/LiverCrop/segresnet2d_1 2022-11-28 02:27:09,540 - INFO - /home/ubuntu/Code/autoseg3d/LiverCrop/dints_0 2022-11-28 02:27:16,466 - INFO - /home/ubuntu/Code/autoseg3d/LiverCrop/dints_1 2022-11-28 02:27:23,697 - INFO - /home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0 2022-11-28 02:27:30,749 - INFO - /home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_1 2022-11-28 02:27:38,115 - INFO - /home/ubuntu/Code/autoseg3d/LiverCrop/segresnet_0 2022-11-28 02:27:45,072 - INFO - /home/ubuntu/Code/autoseg3d/LiverCrop/segresnet_1 2022-11-28 02:27:45,075 - INFO - Launching: torchrun --nnodes=1 --nproc_per_node=4 /home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/scripts/train.py run --config_file='/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/network.yaml','/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/transforms_infer.yaml','/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/transforms_train.yaml','/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/transforms_validate.yaml','/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/hyper_parameters.yaml' ['torchrun', '--nnodes=1', '--nproc_per_node=4', '/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/scripts/train.py', 'run', "--config_file='/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/network.yaml','/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/transforms_infer.yaml','/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/transforms_train.yaml','/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/transforms_validate.yaml','/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/hyper_parameters.yaml'"] Traceback (most recent call last): File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/monai/apps/auto3dseg/bundle_gen.py", line 186, in _run_cmd normal_out = subprocess.run(cmd.split(), env=ps_environ, check=True, capture_output=True) File "/usr/local/lib/python3.8/subprocess.py", line 512, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['torchrun', '--nnodes=1', '--nproc_per_node=4', '/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/scripts/train.py', 'run', "--config_file='/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/network.yaml','/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/transforms_infer.yaml','/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/transforms_train.yaml','/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/transforms_validate.yaml','/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/configs/hyper_parameters.yaml'"]' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/usr/local/lib/python3.8/runpy.py", line 193, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/local/lib/python3.8/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/monai/apps/auto3dseg/main.py", line 22, in fire.Fire( File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/monai/apps/auto3dseg/auto_runner.py", line 586, in run self._train_algo_in_sequence(history) File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/monai/apps/auto3dseg/auto_runner.py", line 488, in _train_algo_in_sequence algo.train(self.train_params) File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/monai/apps/auto3dseg/bundle_gen.py", line 203, in train return self._run_cmd(cmd, devices_info) File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/monai/apps/auto3dseg/bundle_gen.py", line 191, in _run_cmd raise RuntimeError(f"subprocess call error {e.returncode}: {errors}, {output}") from e RuntimeError: subprocess call error 1: b'WARNING:torch.distributed.run:


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


Traceback (most recent call last): File "/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/scripts/train.py", line 409, in fire.Fire() File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace component = fn(*varargs, kwargs) File "/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/scripts/train.py", line 95, in run str_img = os.path.join(data_file_base_dir, list_train[_i]["image"]) File "/usr/local/lib/python3.8/posixpath.py", line 90, in join genericpath._check_arg_types(\'join\', a, p) File "/usr/local/lib/python3.8/genericpath.py", line 152, in _check_arg_types raise TypeError(f\'{funcname}() argument must be str, bytes, or \' TypeError: join() argument must be str, bytes, or os.PathLike object, not \'list\' Traceback (most recent call last): File "/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/scripts/train.py", line 409, in fire.Fire() File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire Traceback (most recent call last): File "/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/scripts/train.py", line 409, in component, remaining_args = _CallAndUpdateTrace( File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace fire.Fire() File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 141, in Fire component = fn(varargs, kwargs) File "/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/scripts/train.py", line 95, in run component_trace = _Fire(component, args, parsed_flag_args, context, name) str_img = os.path.join(data_file_base_dir, list_train[_i]["image"]) File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire

File "/usr/local/lib/python3.8/posixpath.py", line 90, in join genericpath._check_arg_types(\'join\', a, p) File "/usr/local/lib/python3.8/genericpath.py", line 152, in _check_arg_types raise TypeError(f\'{funcname}() argument must be str, bytes, or \' TypeError: join() argument must be str, bytes, or os.PathLike object, not \'list\' component, remaining_args = _CallAndUpdateTrace( File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace component = fn(varargs, kwargs) File "/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/scripts/train.py", line 95, in run str_img = os.path.join(data_file_base_dir, list_train[_i]["image"]) File "/usr/local/lib/python3.8/posixpath.py", line 90, in join genericpath._check_arg_types(\'join\', a, p) File "/usr/local/lib/python3.8/genericpath.py", line 152, in _check_arg_types raise TypeError(f\'{funcname}() argument must be str, bytes, or \' TypeError: join() argument must be str, bytes, or os.PathLike object, not \'list\' Traceback (most recent call last): File "/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/scripts/train.py", line 409, in fire.Fire() File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 466, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace component = fn(varargs, kwargs) File "/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/scripts/train.py", line 95, in run str_img = os.path.join(data_file_base_dir, list_train[_i]["image"]) File "/usr/local/lib/python3.8/posixpath.py", line 90, in join genericpath._check_arg_types(\'join\', a, p) File "/usr/local/lib/python3.8/genericpath.py", line 152, in _check_arg_types raise TypeError(f\'{funcname}() argument must be str, bytes, or \' TypeError: join() argument must be str, bytes, or os.PathLike object, not \'list\' ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 34440) of binary: /home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/bin/python Traceback (most recent call last): File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/bin/torchrun", line 8, in sys.exit(main()) File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 346, in wrapper return f(args, **kwargs) File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/torch/distributed/run.py", line 762, in main run(args) File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/torch/distributed/run.py", line 753, in run elastic_launch( File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 132, in call return launch_agent(self._config, self._entrypoint, list(args)) File "/home/ubuntu/.cache/pypoetry/virtualenvs/qbdl-autoseg3d-a3-JfBFH-py3.8/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

/home/ubuntu/Code/autoseg3d/LiverCrop/swinunetr_0/scripts/train.py FAILED

Failures: [1]: time : 2022-11-28_02:27:51 host : ip-172-20-253-208.us-west-2.compute.internal rank : 1 (local_rank: 1) exitcode : 1 (pid: 34441) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [2]: time : 2022-11-28_02:27:51 host : ip-172-20-253-208.us-west-2.compute.internal rank : 2 (local_rank: 2) exitcode : 1 (pid: 34442) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html [3]: time : 2022-11-28_02:27:51 host : ip-172-20-253-208.us-west-2.compute.internal rank : 3 (local_rank: 3) exitcode : 1 (pid: 34443) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

Root Cause (first observed failure): [0]: time : 2022-11-28_02:27:51 host : ip-172-20-253-208.us-west-2.compute.internal rank : 0 (local_rank: 0) exitcode : 1 (pid: 34440) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html

', b'[info] number of GPUs: 4 [info] number of GPUs: 4 [info] number of GPUs: 4 2022-11-28 02:27:50,522 - Added key: store_based_barrier_key:1 to store for rank: 2 [info] number of GPUs: 4 2022-11-28 02:27:50,564 - Added key: store_based_barrier_key:1 to store for rank: 1 2022-11-28 02:27:50,570 - Added key: store_based_barrier_key:1 to store for rank: 0 2022-11-28 02:27:50,572 - Added key: store_based_barrier_key:1 to store for rank: 3 2022-11-28 02:27:50,572 - Rank 3: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. [info] world_size: 4 2022-11-28 02:27:50,574 - Rank 2: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. [info] world_size: 4 2022-11-28 02:27:50,574 - Rank 1: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. [info] world_size: 4 2022-11-28 02:27:50,581 - Rank 0: Completed store-based barrier for key:store_based_barrier_key:1 with 4 nodes. [info] world_size: 4 '

myron commented 1 year ago

Unfortunately, at this moment, it's not supported, the "image" key should a string (a single image file). But we should be improving it soon.

dongyang0122 commented 1 year ago

hi @moonforsun, thank you for raising the issue! we do not currently have the support for the list of image filenames, but it is in our development plan. You can still try Auto3DSeg with multi-phase or multi-modality images. But some modification on data formats is required. As Auto3DSeg supports datasets of MSD. You can check how the multi-phase data is stored in Task01 and Task05. Normally the multi-phase images are saved as a 4D nifti file. The dimension of the 4D matrix is (dim_x, dim_y, dim_z, c). Once the data is converted, you can run Auto3DSeg without issues on data loading.

moonforsun commented 1 year ago

@dongyang0122 Hi Dong, the 4D nifti file is helpful and can be run successfully. Thank you very much!