Project-MONAI / tutorials

MONAI Tutorials
https://monai.io/started.html
Apache License 2.0
1.81k stars 675 forks source link

Cannot run algo.train() #1224

Closed GowthamE7 closed 1 year ago

GowthamE7 commented 1 year ago

I tried the example on Auto3Dseg model and i couldn't able to start the training in my colab with GPU enabled.

image

KumoLiu commented 1 year ago

Hi @GowthamE7, could you please refer to this discussion and try to use the command shared there? Thanks!

GowthamE7 commented 1 year ago

I am getting error in folds: image

KumoLiu commented 1 year ago

Hi @GowthamE7, did you use the data json under this folder which contains "fold"?

GowthamE7 commented 1 year ago

Hi @KumoLiu, I tried using that file but I got ended in this error: image So I tried base json file by downloading the dataset: image

KumoLiu commented 1 year ago

Hi @GowthamE7, from the screenshot you shared I can't see the cause of the problem. But you should at least use the JSON file in the tutorial repo. And also share the whole error message that you get when you use that JSON file. Maybe I could take a deep look into it.

GowthamE7 commented 1 year ago

@KumoLiu, I tried the json file inside the tutorials but it throws error. image JSON snapshot from tutorials: image But if I use the json which is downloaded while downloading the dataset I can run the code without Error. image JSON snapshot from dataset: image

KumoLiu commented 1 year ago

Hi @GowthamE7, I noticed that here has a different format of the file path, maybe it's the reason.

Screen Shot 2023-02-20 at 22 42 23
GowthamE7 commented 1 year ago

Hi @KumoLiu, Thanks for the answer I can now run the tutorial. I tried the same procedure to my custom dataset but I cannot able to run the training. image DataLoader image Error: image

KumoLiu commented 1 year ago

Hi @GowthamE7, could you please share the whole error message? Thanks!

GowthamE7 commented 1 year ago

@KumoLiu , This is the error message

/usr/local/lib/python3.8/dist-packages/monai/utils/deprecate_utils.py:321: FutureWarning: monai.transforms.io.dictionary LoadImaged.init:image_only: Current default value of argument image_only=False has been deprecated since version 1.1. It will be changed to image_only=True in version 1.3. warn_deprecated(argname, msg, warning_category) [info] number of GPUs: 1 [info] world_size: 1 train_files_w: 103 train_files_a: 104 val_files: 53 /usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py:554: UserWarning: This DataLoader will create 6 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary. warnings.warn(_create_warning_msg( 2023-02-20 15:27:51.872644 - Length of input patch is recommended to be a multiple of 32. num_epochs 1000 num_epochs_warmup 500 num_epochs_per_validation 20 [info] amp enabled 2023-02-20 15:27:55.253999: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2023-02-20 15:27:56.182970: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2023-02-20 15:27:56.183134: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/lib64-nvidia 2023-02-20 15:27:56.183157: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

epoch 1/1000 learning rate is set to 0.025 [2023-02-20 15:28:02] 1/52, train_loss: 1.0072 Process Process-4: Process Process-6: Process Process-2: Process Process-5: Process Process-3: Exception ignored in: <function _MultiProcessingDataLoaderIter.del at 0x7f2bb0a24d30> Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 1466, in del self._shutdown_workers() File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 1430, in _shutdown_workers w.join(timeout=_utils.MP_STATUS_CHECK_INTERVAL) File "/usr/lib/python3.8/multiprocessing/process.py", line 149, in join res = self._popen.wait(timeout) File "/usr/lib/python3.8/multiprocessing/popen_fork.py", line 44, in wait if not wait([self.sentinel], timeout): File "/usr/lib/python3.8/multiprocessing/connection.py", line 931, in wait ready = selector.select(timeout) File "/usr/lib/python3.8/selectors.py", line 415, in select fd_event_list = self._selector.poll(timeout) KeyboardInterrupt: Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/content/ref_api_work_dir/dints_0/scripts/search.py", line 653, in fire.Fire() File "/usr/local/lib/python3.8/dist-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/usr/local/lib/python3.8/dist-packages/fire/core.py", line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/usr/local/lib/python3.8/dist-packages/fire/core.py", line 691, in _CallAndUpdateTrace component = fn(*varargs, kwargs) File "/content/ref_api_work_dir/dints_0/scripts/search.py", line 331, in run outputs = model(inputs) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "/usr/local/lib/python3.8/dist-packages/monai/networks/nets/dints.py", line 504, in forward outputs = self.dints_space(inputs) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "/usr/local/lib/python3.8/dist-packages/monai/networks/nets/dints.py", line 1046, in forward self.cell_tree[str((blk_idx, res_idx))](inputs[self.arch_code2in[res_idx]], weight=_w) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.8/dist-packages/monai/networks/nets/dints.py", line 315, in forward x = self.op(x, weight) ^C

KumoLiu commented 1 year ago

Hi @GowthamE7, I see KeyboardInterrupt in your error message, I would like to say you may interrupt the process unintentionally. Thanks!

GowthamE7 commented 1 year ago

@KumoLiu, I didn't press ctrl+c. The code is getting stopped by itself. I checked it by running the code again and not touching my laptop and I am getting the same error. /usr/local/lib/python3.8/dist-packages/monai/utils/deprecate_utils.py:321: FutureWarning: monai.transforms.io.dictionary LoadImaged.init:image_only: Current default value of argument image_only=False has been deprecated since version 1.1. It will be changed to image_only=True in version 1.3. warn_deprecated(argname, msg, warning_category) [info] number of GPUs: 1 [info] world_size: 1 train_files_w: 3 train_files_a: 4 val_files: 7 ^C

dongyang0122 commented 1 year ago

hi @GowthamE7, in your latest run, the log did not show the first iteration (even the smaller amount of data). But your previous run showed. can you confirm that the error messages are the same?

KumoLiu commented 1 year ago

Hi @GowthamE7

/usr/local/lib/python3.8/dist-packages/monai/utils/deprecate_utils.py:321: FutureWarning: monai.transforms.io.dictionary LoadImaged.init:image_only: Current default value of argument image_only=False has been deprecated since version 1.1. It will be changed to image_only=True in version 1.3. warn_deprecated(argname, msg, warning_category)

The message here, it's just a warning here, you can filter this warning message by adding image_only=True in LoadImage, but it's not the issue and won't stop your process.

[info] number of GPUs: 1 [info] world_size: 1 train_files_w: 3 train_files_a: 4 val_files: 7

Here is just some logging information.

^C

And here is the 'ctrl+c' I think.

GowthamE7 commented 1 year ago

Hi @dongyang0122, I tried to run many times but the code is stopping because of "^C", but I din't press ctrl+c.Thanks!

GowthamE7 commented 1 year ago

Hi @GowthamE7

/usr/local/lib/python3.8/dist-packages/monai/utils/deprecate_utils.py:321: FutureWarning: monai.transforms.io.dictionary LoadImaged.init:image_only: Current default value of argument image_only=False has been deprecated since version 1.1. It will be changed to image_only=True in version 1.3. warn_deprecated(argname, msg, warning_category)

The message here, it's just a warning here, you can filter this warning message by adding image_only=True in LoadImage, but it's not the issue and won't stop your process.

[info] number of GPUs: 1 [info] world_size: 1 train_files_w: 3 train_files_a: 4 val_files: 7

Here is just some logging information.

^C

And here is the 'ctrl+c' I think.

Hi @KumoLiu, Thanks for the explanation.

GowthamE7 commented 1 year ago

Hi @KumoLiu, I cannot able to load the dataset and I am using the json file you mentioned.

RuntimeError Traceback (most recent call last) Input In [109], in <cell line: 7>() 5 datastats_file = os.path.join(work_dir, "data_stats.yaml") 6 analyser = DataAnalyzer(datalist_file, dataroot, output_path=datastats_file) ----> 7 datastat = analyser.get_all_case_stats() 9 print("datalist file: ", os.path.abspath(datalist_file)) 10 print("dataroot path: ", os.path.abspath(dataroot))

File ~/conda/lib/python3.8/site-packages/monai/apps/auto3dseg/data_analyzer.py:235, in DataAnalyzer.get_all_case_stats(self, key, transform_list) 232 if not has_tqdm: 233 warnings.warn("tqdm is not installed. not displaying the caching progress.") --> 235 for batch_data in tqdm(dataloader) if has_tqdm else dataloader: 237 batch_data = batch_data[0] 238 batch_data[self.image_key] = batch_data[self.image_key].to(self.device)

File ~/conda/lib/python3.8/site-packages/tqdm/std.py:1195, in tqdm.iter(self) 1192 time = self._time 1194 try: -> 1195 for obj in iterable: 1196 yield obj 1197 # Update and possibly print the progressbar. 1198 # Note: does not call self.update(1) for speed optimisation.

File ~/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py:652, in _BaseDataLoaderIter.next(self) 649 if self._sampler_iter is None: 650 # TODO(https://github.com/pytorch/pytorch/issues/76750)%3C/span%3E) 651 self._reset() # type: ignore[call-arg] --> 652 data = self._next_data() 653 self._num_yielded += 1 654 if self._dataset_kind == _DatasetKind.Iterable and \ 655 self._IterableDataset_len_called is not None and \ 656 self._num_yielded > self._IterableDataset_len_called:

File ~/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py:1347, in _MultiProcessingDataLoaderIter._next_data(self) 1345 else: 1346 del self._task_info[idx] -> 1347 return self._process_data(data)

File ~/conda/lib/python3.8/site-packages/torch/utils/data/dataloader.py:1373, in _MultiProcessingDataLoaderIter._process_data(self, data) 1371 self._try_put_index() 1372 if isinstance(data, ExceptionWrapper): -> 1373 data.reraise() 1374 return data

File ~/conda/lib/python3.8/site-packages/torch/_utils.py:461, in ExceptionWrapper.reraise(self) 457 except TypeError: 458 # If the exception takes multiple arguments, don't try to 459 # instantiate since we don't know how to 460 raise RuntimeError(msg) from None --> 461 raise exception

RuntimeError: Caught RuntimeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/jovyan/conda/lib/python3.8/site-packages/monai/transforms/transform.py", line 102, in apply_transform File "/home/jovyan/conda/lib/python3.8/site-packages/monai/transforms/transform.py", line 66, in _apply_transform File "/home/jovyan/conda/lib/python3.8/site-packages/monai/transforms/io/dictionary.py", line 154, in call File "/home/jovyan/conda/lib/python3.8/site-packages/monai/transforms/io/array.py", line 266, in call RuntimeError: LoadImage cannot find a suitable reader for file: /home/jovyan/Task04_Hippocampus/imagesTr/hippocampus_367.nii.gz. Please install the reader libraries, see also the installation instructions: https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies. The current registered: [<monai.data.image_reader.NumpyReader object at 0x7f3424368fa0>, <monai.data.image_reader.PILReader object at 0x7f34243689d0>].

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/jovyan/conda/lib/python3.8/site-packages/monai/transforms/transform.py", line 102, in apply_transform File "/home/jovyan/conda/lib/python3.8/site-packages/monai/transforms/transform.py", line 66, in _apply_transform File "/home/jovyan/conda/lib/python3.8/site-packages/monai/transforms/compose.py", line 174, in call File "/home/jovyan/conda/lib/python3.8/site-packages/monai/transforms/transform.py", line 129, in apply_transform RuntimeError: applying transform <monai.transforms.io.dictionary.LoadImaged object at 0x7f34243686a0>

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/jovyan/conda/lib/python3.8/site-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop data = fetcher.fetch(index) File "/home/jovyan/conda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/jovyan/conda/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py", line 49, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/jovyan/conda/lib/python3.8/site-packages/monai/data/dataset.py", line 107, in getitem File "/home/jovyan/conda/lib/python3.8/site-packages/monai/data/dataset.py", line 93, in _transform File "/home/jovyan/conda/lib/python3.8/site-packages/monai/transforms/transform.py", line 129, in apply_transform RuntimeError: applying transform <monai.transforms.compose.Compose object at 0x7f342435a400>

KumoLiu commented 1 year ago

Hi @GowthamE7, from the error message I see that LoadImage cannot find a suitable reader for file. I would like to say it may be an environment issue.

RuntimeError: LoadImage cannot find a suitable reader for file: /home/jovyan/Task04_Hippocampus/imagesTr/hippocampus_367.nii.gz.
Please install the reader libraries, see also the installation instructions:
https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies.
The current registered: [<monai.data.image_reader.NumpyReader object at 0x7f3424368fa0>, <monai.data.image_reader.PILReader object at 0x7f34243689d0>].

Could you please check your environment and install all the requirements? You can find some help from here Hope it can help you, thanks!

GowthamE7 commented 1 year ago

Hi @KumoLiu, I have install all the dependencies but I am still getting the error. thanks!

KumoLiu commented 1 year ago

Hi @GowthamE7, could you please try pip show nibabel in the terminal and show me the output?

GowthamE7 commented 1 year ago

Hi @KumoLiu, image

KumoLiu commented 1 year ago

Hi @KumoLiu, image

Oh, I forgot you train in the jupyter, could you please try the same command in the jupyter? If it takes out the same result, then it's wired. But if not, it may due to the different env.

GowthamE7 commented 1 year ago

@KumoLiu, image

KumoLiu commented 1 year ago

Hi @GowthamE7, then it may be due to the wrong path, could you please try this command in the cell?

from monai.transforms import LoadImage

data_path = "./imagesTs/hippocampus_267.nii.gz"  # any data path in the json file you used
test = LoadImage(image_only=True)(data_path)