Project-MONAI / MONAILabel

MONAI Label is an intelligent open source image labeling and learning tool.
https://docs.monai.io/projects/label
Apache License 2.0
628 stars 196 forks source link

torch.cuda.OutOfMemoryError & RuntimeError: applying transform #1351

Open keyurradia opened 1 year ago

keyurradia commented 1 year ago

Describe the bug Before I got to update my cudatoolkit 11.8 MonaiLabel was not recognizing cudatoolkit 11.3 and cuda was disabling. The training process was ran in CPU mode and was running fine. After I got to update cudatoolkit 11.8 and then cuda is not getting disabled but I am getting - RuntimeError: applying transform <monai.transforms.compose.Compose object at 0x0000014188C0B160>.
When I go through then I found the **_"torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.69 GiB (GPU 0; 8.00 GiB total capacity; 6.14 GiB already allocated; 0 bytes free; 6.22 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Server logs [2023-03-22 11:57:12,660] [7720] [MainThread] [INFO] (monailabel.endpoints.datastore:68) - Image: 23.01.03.18; File: <starlette.datastructures.UploadFile object at 0x000001C073531D90>; params: {"client_id": "user-xyz"} [2023-03-22 11:57:12,746] [7720] [MainThread] [INFO] (monailabel.datastore.local:439) - Adding Image: 23.01.03.18 => C:\Users\keyur\AppData\Local\Temp\tmp1n82rfbc.nii.gz [2023-03-22 11:57:13,325] [7720] [MainThread] [INFO] (monailabel.endpoints.datastore:101) - Saving Label for 23.01.03.18 for tag: final by admin [2023-03-22 11:57:13,331] [7720] [MainThread] [INFO] (monailabel.endpoints.datastore:112) - Save Label params: {"label_info": [{"name": "liver", "idx": 1}, {"name": "venaporta", "idx": 2}, {"name": "livervein", "idx": 3}, {"name": "venacava", "idx": 4}, {"name": "lesions", "idx": 5}], "client_id": "user-xyz"} [2023-03-22 11:57:13,332] [7720] [MainThread] [INFO] (monailabel.datastore.local:486) - Saving Label for Image: 23.01.03.18; Tag: final; Info: {'label_info': [{'name': 'liver', 'idx': 1}, {'name': 'venaporta', 'idx': 2}, {'name': 'livervein', 'idx': 3}, {'name': 'venacava', 'idx': 4}, {'name': 'lesions', 'idx': 5}], 'client_id': 'user-xyz'} [2023-03-22 11:57:13,333] [7720] [MainThread] [INFO] (monailabel.datastore.local:494) - Adding Label: 23.01.03.18 => final => C:\Users\keyur\AppData\Local\Temp\tmpm17x2cn6.nii.gz [2023-03-22 11:57:13,338] [7720] [MainThread] [INFO] (monailabel.datastore.local:510) - Label Info: {'label_info': [{'name': 'liver', 'idx': 1}, {'name': 'venaporta', 'idx': 2}, {'name': 'livervein', 'idx': 3}, {'name': 'venacava', 'idx': 4}, {'name': 'lesions', 'idx': 5}], 'client_id': 'user-xyz', 'ts': 1679482633, 'name': '23.01.03.18.nii.gz'} [2023-03-22 11:57:13,344] [7720] [MainThread] [INFO] (monailabel.interfaces.app:492) - New label saved for: 23.01.03.18 => 23.01.03.18 [2023-03-22 11:57:16,062] [7720] [MainThread] [INFO] (monailabel.utils.async_tasks.task:41) - Train request: {'model': 'segmentation', 'name': 'train_01', 'pretrained': True, 'device': 'cuda', 'max_epochs': 50, 'early_stop_patience': -1, 'val_split': 0.2, 'train_batch_size': 1, 'val_batch_size': 1, 'multi_gpu': True, 'gpus': 'all', 'dataset': 'SmartCacheDataset', 'dataloader': 'ThreadDataLoader', 'tracking': 'mlflow', 'tracking_uri': '', 'tracking_experiment_name': '', 'client_id': 'user-xyz'} [2023-03-22 11:57:16,063] [7720] [ThreadPoolExecutor-2_0] [INFO] (monailabel.utils.async_tasks.utils:49) - Before:: C:\Users\keyur\MONAILabel; [2023-03-22 11:57:16,064] [7720] [ThreadPoolExecutor-2_0] [INFO] (monailabel.utils.async_tasks.utils:53) - After:: C:\Users\keyur\MONAILabel; [2023-03-22 11:57:16,065] [7720] [ThreadPoolExecutor-2_0] [INFO] (monailabel.utils.async_tasks.utils:65) - COMMAND:: C:\Users\keyur.conda\envs\monai\python.exe -m monailabel.interfaces.utils.app -m train -r {"model":"segmentation","name":"train_01","pretrained":true,"device":"cuda","max_epochs":50,"early_stop_patience":-1,"val_split":0.2,"train_batch_size":1,"val_batch_size":1,"multi_gpu":true,"gpus":"all","dataset":"SmartCacheDataset","dataloader":"ThreadDataLoader","tracking":"mlflow","tracking_uri":"","tracking_experiment_name":"","client_id":"user-xyz"} [2023-03-22 11:57:17,250] [32928] [MainThread] [INFO] (main:37) - Initializing App from: C:\Users\keyur\MONAILabel\monailabel\scripts\apps\radiology; studies: C:\Users\keyur\MONAILabel\monailabel\scripts\datasets\training; conf: {'models': 'segmentation'} [2023-03-22 11:57:22,938] [32928] [MainThread] [INFO] (monailabel.utils.others.class_utils:57) - Subclass for MONAILabelApp Found: <class 'main.MyApp'> [2023-03-22 11:57:22,947] [32928] [MainThread] [INFO] (monailabel.utils.others.class_utils:57) - Subclass for TaskConfig Found: <class 'lib.configs.deepedit.DeepEdit'> [2023-03-22 11:57:22,948] [32928] [MainThread] [INFO] (monailabel.utils.others.class_utils:57) - Subclass for TaskConfig Found: <class 'lib.configs.deepgrow_2d.Deepgrow2D'> [2023-03-22 11:57:22,948] [32928] [MainThread] [INFO] (monailabel.utils.others.class_utils:57) - Subclass for TaskConfig Found: <class 'lib.configs.deepgrow_3d.Deepgrow3D'> [2023-03-22 11:57:22,949] [32928] [MainThread] [INFO] (monailabel.utils.others.class_utils:57) - Subclass for TaskConfig Found: <class 'lib.configs.localization_spine.LocalizationSpine'> [2023-03-22 11:57:22,949] [32928] [MainThread] [INFO] (monailabel.utils.others.class_utils:57) - Subclass for TaskConfig Found: <class 'lib.configs.localization_vertebra.LocalizationVertebra'> [2023-03-22 11:57:22,950] [32928] [MainThread] [INFO] (monailabel.utils.others.class_utils:57) - Subclass for TaskConfig Found: <class 'lib.configs.segmentation.Segmentation'> [2023-03-22 11:57:22,950] [32928] [MainThread] [INFO] (monailabel.utils.others.class_utils:57) - Subclass for TaskConfig Found: <class 'lib.configs.segmentation_spleen.SegmentationSpleen'> [2023-03-22 11:57:22,951] [32928] [MainThread] [INFO] (monailabel.utils.others.class_utils:57) - Subclass for TaskConfig Found: <class 'lib.configs.segmentation_vertebra.SegmentationVertebra'> [2023-03-22 11:57:22,951] [32928] [MainThread] [INFO] (main:93) - +++ Adding Model: segmentation => lib.configs.segmentation.Segmentation [2023-03-22 11:57:22,974] [32928] [MainThread] [INFO] (main:96) - +++ Using Models: ['segmentation'] [2023-03-22 11:57:22,974] [32928] [MainThread] [INFO] (monailabel.interfaces.app:134) - Init Datastore for: C:\Users\keyur\MONAILabel\monailabel\scripts\datasets\training [2023-03-22 11:57:22,975] [32928] [MainThread] [INFO] (monailabel.datastore.local:130) - Auto Reload: False; Extensions: ['.nii.gz', '.nii', '.nrrd', '.jpg', '.png', '.tif', '.svs', '.xml'] [2023-03-22 11:57:22,986] [32928] [MainThread] [INFO] (monailabel.datastore.local:577) - Invalidate count: 0 [2023-03-22 11:57:22,987] [32928] [MainThread] [INFO] (main:126) - +++ Adding Inferer:: segmentation => <lib.infers.segmentation.Segmentation object at 0x00000141821A56D0> [2023-03-22 11:57:22,987] [32928] [MainThread] [INFO] (main:191) - {'segmentation': <lib.infers.segmentation.Segmentation object at 0x00000141821A56D0>, 'Histogram+GraphCut': <monailabel.scribbles.infer.HistogramBasedGraphCut object at 0x000001418AA7F370>, 'GMM+GraphCut': <monailabel.scribbles.infer.GMMBasedGraphCut object at 0x000001418AA7F340>} [2023-03-22 11:57:22,987] [32928] [MainThread] [INFO] (main:206) - +++ Adding Trainer:: segmentation => <lib.trainers.segmentation.Segmentation object at 0x000001418AA7F3A0> [2023-03-22 11:57:22,987] [32928] [MainThread] [INFO] (monailabel.utils.sessions:51) - Session Path: C:\Users\keyur.cache\monailabel\sessions [2023-03-22 11:57:22,987] [32928] [MainThread] [INFO] (monailabel.utils.sessions:52) - Session Expiry (max): 3600 [2023-03-22 11:57:22,987] [32928] [MainThread] [INFO] (monailabel.tasks.train.basic_train:432) - Train Request (input): {'model': 'segmentation', 'name': 'train_01', 'pretrained': True, 'device': 'cuda', 'max_epochs': 50, 'early_stop_patience': -1, 'val_split': 0.2, 'train_batch_size': 1, 'val_batch_size': 1, 'multi_gpu': True, 'gpus': 'all', 'dataset': 'SmartCacheDataset', 'dataloader': 'ThreadDataLoader', 'tracking': 'mlflow', 'tracking_uri': '', 'tracking_experiment_name': '', 'client_id': 'user-xyz', 'local_rank': 0} [2023-03-22 11:57:22,987] [32928] [MainThread] [INFO] (monailabel.tasks.train.basic_train:442) - CUDA_VISIBLE_DEVICES: None [2023-03-22 11:57:22,989] [32928] [MainThread] [INFO] (monailabel.tasks.train.basic_train:447) - Distributed/Multi GPU is limited [2023-03-22 11:57:22,989] [32928] [MainThread] [INFO] (monailabel.tasks.train.basic_train:462) - Distributed Training = FALSE [2023-03-22 11:57:22,989] [32928] [MainThread] [INFO] (monailabel.tasks.train.basic_train:489) - 0 - Train Request (final): {'name': 'train_01', 'pretrained': True, 'device': 'cuda', 'max_epochs': 50, 'early_stop_patience': -1, 'val_split': 0.2, 'train_batch_size': 1, 'val_batch_size': 1, 'multi_gpu': False, 'gpus': 'all', 'dataset': 'SmartCacheDataset', 'dataloader': 'ThreadDataLoader', 'tracking': 'mlflow', 'tracking_uri': '', 'tracking_experiment_name': '', 'model': 'segmentation', 'client_id': 'user-xyz', 'local_rank': 0, 'run_id': '20230322_115722'} [2023-03-22 11:57:22,989] [32928] [MainThread] [INFO] (monailabel.tasks.train.basic_train:622) - 0 - Using Device: cuda; IDX: None [2023-03-22 11:57:22,989] [32928] [MainThread] [INFO] (monailabel.tasks.train.basic_train:515) - Run/Output Path: C:\Users\keyur\MONAILabel\monailabel\scripts\apps\radiology\model\segmentation\train_01 [2023-03-22 11:57:22,989] [32928] [MainThread] [INFO] (monailabel.tasks.train.basic_train:531) - Tracking: mlflow [2023-03-22 11:57:22,989] [32928] [MainThread] [INFO] (monailabel.tasks.train.basic_train:532) - Tracking URI: file:///C:/Users/keyur/MONAILabel/monailabel/scripts/apps/radiology/model/segmentation/train_01/mlruns; [2023-03-22 11:57:22,989] [32928] [MainThread] [INFO] (monailabel.tasks.train.basic_train:533) - Tracking Experiment Name: segmentation; Run Name: run_20230322_115722 [2023-03-22 11:57:22,989] [32928] [MainThread] [INFO] (monailabel.tasks.train.basic_train:410) - Total Records for Training: 6 [2023-03-22 11:57:22,989] [32928] [MainThread] [INFO] (monailabel.tasks.train.basic_train:411) - Total Records for Validation: 2 Loading dataset: 0%| | 0/2 [00:00<?, ?it/s] Loading dataset: 50%|##### | 1/2 [00:11<00:11, 11.37s/it] Loading dataset: 100%|##########| 2/2 [00:21<00:00, 10.42s/it] Loading dataset: 100%|##########| 2/2 [00:21<00:00, 10.57s/it] cache_num is greater or equal than dataset length, fall back to regular monai.data.CacheDataset. [2023-03-22 11:57:44,226] [32928] [MainThread] [INFO] (monailabel.tasks.train.basic_train:328) - 0 - Records for Validation: 2 [2023-03-22 11:57:44,237] [32928] [MainThread] [INFO] (monailabel.tasks.train.basic_train:318) - 0 - Adding Validation to run every '1' interval [2023-03-22 11:57:44,240] [32928] [MainThread] [INFO] (monailabel.tasks.train.basic_train:710) - 0 - Load Path C:\Users\keyur\MONAILabel\monailabel\scripts\apps\radiology\model\segmentation\train_01\model.pt Loading dataset: 0%| | 0/6 [00:00<?, ?it/s] Loading dataset: 17%|#6 | 1/6 [00:10<00:52, 10.54s/it] Loading dataset: 33%|###3 | 2/6 [00:15<00:28, 7.00s/it] Loading dataset: 50%|##### | 3/6 [00:26<00:27, 9.16s/it] Loading dataset: 67%|######6 | 4/6 [00:35<00:17, 8.80s/it] Loading dataset: 83%|########3 | 5/6 [00:47<00:10, 10.14s/it] Loading dataset: 100%|##########| 6/6 [01:02<00:00, 11.67s/it] Loading dataset: 100%|##########| 6/6 [01:02<00:00, 10.37s/it] [2023-03-22 11:58:46,454] [32928] [MainThread] [INFO] (monailabel.tasks.train.basic_train:264) - 0 - Records for Training: 6 [2023-03-22 11:58:46,458] [32928] [MainThread] [INFO] (ignite.engine.engine.SupervisedTrainer:876) - Engine run resuming from iteration 0, epoch 0 until 50 epochs [2023-03-22 11:58:46,617] [32928] [MainThread] [INFO] (ignite.engine.engine.SupervisedTrainer:138) - Restored all variables from C:\Users\keyur\MONAILabel\monailabel\scripts\apps\radiology\model\segmentation\train_01\model.pt [2023-03-22 11:58:51,634] [32928] [MainThread] [INFO] (ignite.engine.engine.SupervisedTrainer:272) - Epoch: 1/50, Iter: 1/6 -- train_loss: 0.9931 [2023-03-22 11:58:52,005] [32928] [MainThread] [INFO] (ignite.engine.engine.SupervisedTrainer:272) - Epoch: 1/50, Iter: 2/6 -- train_loss: 0.9202 [2023-03-22 11:58:52,382] [32928] [MainThread] [INFO] (ignite.engine.engine.SupervisedTrainer:272) - Epoch: 1/50, Iter: 3/6 -- train_loss: 0.8346 [2023-03-22 11:58:52,736] [32928] [MainThread] [INFO] (ignite.engine.engine.SupervisedTrainer:272) - Epoch: 1/50, Iter: 4/6 -- train_loss: 0.8939 [2023-03-22 11:58:53,133] [32928] [MainThread] [INFO] (ignite.engine.engine.SupervisedTrainer:272) - Epoch: 1/50, Iter: 5/6 -- train_loss: 0.9609 [2023-03-22 11:58:53,435] [32928] [MainThread] [INFO] (ignite.engine.engine.SupervisedTrainer:272) - Epoch: 1/50, Iter: 6/6 -- train_loss: 0.8174 [2023-03-22 11:58:53,442] [32928] [MainThread] [INFO] (ignite.engine.engine.SupervisedTrainer:257) - Got new best metric of train_mean_dice: 0.2552022635936737 [2023-03-22 11:58:53,442] [32928] [MainThread] [INFO] (ignite.engine.engine.SupervisedTrainer:201) - Epoch[1] Metrics -- train_lesions_mean_dice: 0.0045 train_liver_mean_dice: 0.6504 train_livervein_mean_dice: 0.3602 train_mean_dice: 0.2552 train_venacava_mean_dice: 0.0001 train_venaporta_mean_dice: 0.2535 [2023-03-22 11:58:53,442] [32928] [MainThread] [INFO] (ignite.engine.engine.SupervisedTrainer:212) - Key metric: train_mean_dice best value: 0.2552022635936737 at epoch: 1 [2023-03-22 11:58:53,448] [32928] [MainThread] [INFO] (ignite.engine.engine.SupervisedEvaluator:876) - Engine run resuming from iteration 0, epoch 0 until 1 epochs

[2023-03-22 11:58:57,944] [32928] [MainThread] [ERROR] (ignite.engine.engine.SupervisedEvaluator:1086) - Current run is terminating due to exception: applying transform <monai.transforms.compose.Compose object at 0x0000014188C0B160>

[2023-03-22 11:58:57,945] [32928] [MainThread] [ERROR] (ignite.engine.engine.SupervisedEvaluator:180) - Exception: applying transform <monai.transforms.compose.Compose object at 0x0000014188C0B160> Traceback (most recent call last): File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 102, in apply_transform return _apply_transform(transform, data, unpack_items) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 66, in _apply_transform return transform(parameters) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\post\dictionary.py", line 202, in call d[key] = self.converter(d[key], argmax, to_onehot, threshold, rounding) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\post\array.py", line 220, in call img_t = one_hot( File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\networks\utils.py", line 158, in one_hot o = torch.zeros(size=sh, dtype=dtype, device=labels.device)

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.69 GiB (GPU 0; 8.00 GiB total capacity; 6.14 GiB already allocated; 0 bytes free; 6.22 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

The above exception was the direct cause of the following exception: Traceback (most recent call last): File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 102, in apply_transform return _apply_transform(transform, data, unpack_items) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 66, in _applytransform return transform(parameters) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\compose.py", line 174, in call input = apply_transform(transform, input, self.map_items, self.unpack_items, self.log_stats) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 129, in apply_transform raise RuntimeError(f"applying transform {transform}") from e RuntimeError: applying transform <monai.transforms.post.dictionary.AsDiscreted object at 0x0000014188BFE5B0> The above exception was the direct cause of the following exception: Traceback (most recent call last): File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 1068, in _run_once_on_dataset_as_gen self.state.output = self._process_function(self, self.state.batch) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\engines\evaluator.py", line 308, in _iteration engine.fire_event(IterationEvents.MODEL_COMPLETED) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 449, in fire_event return self._fire_event(event_name) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 425, in _fire_event func(first, (event_args + others), kwargs) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\engines\workflow.py", line 224, in _run_postprocessing engine.state.batch[i], engine.state.output[i] = engine_apply_transform(b, o, posttrans) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\engines\utils.py", line 258, in engine_apply_transform transformed_data = apply_transform(transform, data) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 129, in apply_transform raise RuntimeError(f"applying transform {transform}") from e RuntimeError: applying transform <monai.transforms.compose.Compose object at 0x0000014188C0B160> [2023-03-22 11:58:58,024] [32928] [MainThread] [ERROR] (ignite.engine.engine.SupervisedEvaluator:992) - Engine run is terminating due to exception: applying transform <monai.transforms.compose.Compose object at 0x0000014188C0B160> [2023-03-22 11:58:58,024] [32928] [MainThread] [ERROR] (ignite.engine.engine.SupervisedEvaluator:180) - Exception: applying transform <monai.transforms.compose.Compose object at 0x0000014188C0B160> Traceback (most recent call last): File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 102, in apply_transform return _apply_transform(transform, data, unpack_items) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 66, in _apply_transform return transform(parameters) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\post\dictionary.py", line 202, in call d[key] = self.converter(d[key], argmax, to_onehot, threshold, rounding) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\post\array.py", line 220, in call img_t = one_hot( File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\networks\utils.py", line 158, in one_hot o = torch.zeros(size=sh, dtype=dtype, device=labels.device) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.69 GiB (GPU 0; 8.00 GiB total capacity; 6.14 GiB already allocated; 0 bytes free; 6.22 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF The above exception was the direct cause of the following exception: Traceback (most recent call last): File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 102, in apply_transform return _apply_transform(transform, data, unpack_items) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 66, in _applytransform return transform(parameters) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\compose.py", line 174, in call input = apply_transform(transform, input, self.map_items, self.unpack_items, self.log_stats) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 129, in apply_transform raise RuntimeError(f"applying transform {transform}") from e RuntimeError: applying transform <monai.transforms.post.dictionary.AsDiscreted object at 0x0000014188BFE5B0> The above exception was the direct cause of the following exception: Traceback (most recent call last): File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 959, in _internal_run_as_gen epoch_time_taken += yield from self._run_once_on_dataset_as_gen() File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 1087, in _run_once_on_dataset_as_gen self._handle_exception(e) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 636, in _handle_exception self._fire_event(Events.EXCEPTION_RAISED, e) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 425, in _fire_event func(first, (event_args + others), kwargs) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\handlers\stats_handler.py", line 181, in exception_raised raise e File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 1068, in _run_once_on_dataset_as_gen self.state.output = self._process_function(self, self.state.batch) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\engines\evaluator.py", line 308, in _iteration engine.fire_event(IterationEvents.MODEL_COMPLETED) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 449, in fire_event return self._fire_event(event_name) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 425, in _fire_event func(first, (event_args + others), kwargs) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\engines\workflow.py", line 224, in _run_postprocessing engine.state.batch[i], engine.state.output[i] = engine_apply_transform(b, o, posttrans) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\engines\utils.py", line 258, in engine_apply_transform transformed_data = apply_transform(transform, data) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 129, in apply_transform raise RuntimeError(f"applying transform {transform}") from e RuntimeError: applying transform <monai.transforms.compose.Compose object at 0x0000014188C0B160> [2023-03-22 11:58:58,027] [32928] [MainThread] [ERROR] (ignite.engine.engine.SupervisedTrainer:992) - Engine run is terminating due to exception: applying transform <monai.transforms.compose.Compose object at 0x0000014188C0B160> [2023-03-22 11:58:58,027] [32928] [MainThread] [ERROR] (ignite.engine.engine.SupervisedTrainer:180) - Exception: applying transform <monai.transforms.compose.Compose object at 0x0000014188C0B160> Traceback (most recent call last): File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 102, in apply_transform return _apply_transform(transform, data, unpack_items) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 66, in _apply_transform return transform(parameters) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\post\dictionary.py", line 202, in call d[key] = self.converter(d[key], argmax, to_onehot, threshold, rounding) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\post\array.py", line 220, in call img_t = one_hot( File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\networks\utils.py", line 158, in one_hot o = torch.zeros(size=sh, dtype=dtype, device=labels.device) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.69 GiB (GPU 0; 8.00 GiB total capacity; 6.14 GiB already allocated; 0 bytes free; 6.22 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF The above exception was the direct cause of the following exception: Traceback (most recent call last): File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 102, in apply_transform return _apply_transform(transform, data, unpack_items) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 66, in _applytransform return transform(parameters) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\compose.py", line 174, in call input = apply_transform(transform, input, self.map_items, self.unpack_items, self.log_stats) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 129, in apply_transform raise RuntimeError(f"applying transform {transform}") from e RuntimeError: applying transform <monai.transforms.post.dictionary.AsDiscreted object at 0x0000014188BFE5B0> The above exception was the direct cause of the following exception: Traceback (most recent call last): File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 965, in _internal_run_as_gen self._fire_event(Events.EPOCH_COMPLETED) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 425, in _fire_event func(first, (event_args + others), kwargs) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\handlers\validation_handler.py", line 76, in call self.validator.run(engine.state.epoch) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\engines\evaluator.py", line 148, in run super().run() File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\engines\workflow.py", line 281, in run super().run(data=self.data_loader, max_epochs=self.state.max_epochs) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 892, in run return self._internal_run() File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 935, in _internal_run return next(self._internal_run_generator) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 993, in _internal_run_as_gen self._handle_exception(e) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 636, in _handle_exception self._fire_event(Events.EXCEPTION_RAISED, e) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 425, in _fire_event func(first, (event_args + others), kwargs) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\handlers\stats_handler.py", line 181, in exception_raised raise e File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 959, in _internal_run_as_gen epoch_time_taken += yield from self._run_once_on_dataset_as_gen() File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 1087, in _run_once_on_dataset_as_gen self._handle_exception(e) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 636, in _handle_exception self._fire_event(Events.EXCEPTION_RAISED, e) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 425, in _fire_event func(first, (event_args + others), kwargs) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\handlers\stats_handler.py", line 181, in exception_raised raise e File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 1068, in _run_once_on_dataset_as_gen self.state.output = self._process_function(self, self.state.batch) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\engines\evaluator.py", line 308, in _iteration engine.fire_event(IterationEvents.MODEL_COMPLETED) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 449, in fire_event return self._fire_event(event_name) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 425, in _fire_event func(first, (event_args + others), kwargs) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\engines\workflow.py", line 224, in _run_postprocessing engine.state.batch[i], engine.state.output[i] = engine_apply_transform(b, o, posttrans) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\engines\utils.py", line 258, in engine_apply_transform transformed_data = apply_transform(transform, data) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 129, in apply_transform raise RuntimeError(f"applying transform {transform}") from e RuntimeError: applying transform <monai.transforms.compose.Compose object at 0x0000014188C0B160> Traceback (most recent call last): File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 102, in apply_transform return _apply_transform(transform, data, unpack_items) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 66, in _apply_transform return transform(parameters) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\post\dictionary.py", line 202, in call d[key] = self.converter(d[key], argmax, to_onehot, threshold, rounding) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\post\array.py", line 220, in call img_t = one_hot( File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\networks\utils.py", line 158, in one_hot o = torch.zeros(size=sh, dtype=dtype, device=labels.device) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.69 GiB (GPU 0; 8.00 GiB total capacity; 6.14 GiB already allocated; 0 bytes free; 6.22 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF The above exception was the direct cause of the following exception: Traceback (most recent call last): File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 102, in apply_transform return _apply_transform(transform, data, unpack_items) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 66, in _applytransform return transform(parameters) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\compose.py", line 174, in call input = apply_transform(transform, input, self.map_items, self.unpack_items, self.log_stats) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 129, in apply_transform raise RuntimeError(f"applying transform {transform}") from e RuntimeError: applying transform <monai.transforms.post.dictionary.AsDiscreted object at 0x0000014188BFE5B0> The above exception was the direct cause of the following exception: Traceback (most recent call last): File "C:\Users\keyur.conda\envs\monai\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\keyur.conda\envs\monai\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "C:\Users\keyur\MONAILabel\monailabel\interfaces\utils\app.py", line 128, in run_main() File "C:\Users\keyur\MONAILabel\monailabel\interfaces\utils\app.py", line 113, in run_main result = a.train(request) File "C:\Users\keyur\MONAILabel\monailabel\interfaces\app.py", line 422, in train result = task(request, self.datastore()) File "C:\Users\keyur\MONAILabel\monailabel\tasks\train\basic_train.py", line 463, in call res = self.train(0, world_size, req, datalist) File "C:\Users\keyur\MONAILabel\monailabel\tasks\train\basic_train.py", line 552, in train context.trainer.run() File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\engines\trainer.py", line 53, in run super().run() File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\engines\workflow.py", line 281, in run super().run(data=self.data_loader, max_epochs=self.state.max_epochs) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 892, in run return self._internal_run() File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 935, in _internal_run return next(self._internal_run_generator) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 993, in _internal_run_as_gen self._handle_exception(e) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 636, in _handle_exception self._fire_event(Events.EXCEPTION_RAISED, e) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 425, in _fire_event func(first, (event_args + others), kwargs) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\handlers\stats_handler.py", line 181, in exception_raised raise e File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 965, in _internal_run_as_gen self._fire_event(Events.EPOCH_COMPLETED) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 425, in _fire_event func(first, (event_args + others), kwargs) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\handlers\validation_handler.py", line 76, in call self.validator.run(engine.state.epoch) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\engines\evaluator.py", line 148, in run super().run() File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\engines\workflow.py", line 281, in run super().run(data=self.data_loader, max_epochs=self.state.max_epochs) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 892, in run return self._internal_run() File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 935, in _internal_run return next(self._internal_run_generator) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 993, in _internal_run_as_gen self._handle_exception(e) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 636, in _handle_exception self._fire_event(Events.EXCEPTION_RAISED, e) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 425, in _fire_event func(first, (event_args + others), kwargs) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\handlers\stats_handler.py", line 181, in exception_raised raise e File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 959, in _internal_run_as_gen epoch_time_taken += yield from self._run_once_on_dataset_as_gen() File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 1087, in _run_once_on_dataset_as_gen self._handle_exception(e) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 636, in _handle_exception self._fire_event(Events.EXCEPTION_RAISED, e) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 425, in _fire_event func(first, (event_args + others), kwargs) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\handlers\stats_handler.py", line 181, in exception_raised raise e File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 1068, in _run_once_on_dataset_as_gen self.state.output = self._process_function(self, self.state.batch) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\engines\evaluator.py", line 308, in _iteration engine.fire_event(IterationEvents.MODEL_COMPLETED) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 449, in fire_event return self._fire_event(event_name) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 425, in _fire_event func(first, (event_args + others), kwargs) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\engines\workflow.py", line 224, in _run_postprocessing engine.state.batch[i], engine.state.output[i] = engine_apply_transform(b, o, posttrans) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\engines\utils.py", line 258, in engine_apply_transform transformed_data = apply_transform(transform, data) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 129, in apply_transform raise RuntimeError(f"applying transform {transform}") from e RuntimeError: applying transform <monai.transforms.compose.Compose object at 0x0000014188C0B160> [2023-03-22 11:58:59,641] [7720] [ThreadPoolExecutor-2_0] [INFO] (monailabel.utils.async_tasks.utils:83) - Return code: 1

To Reproduce Steps to reproduce the behavior:

  1. Activate Monailabel server
  2. Start training from 3D Slicer
  3. Run commands '....' (how you have started monailabel server) : monailabel start_server --app apps/radiology --studies datasets/training --conf models segmentation

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Environment

Ensuring you use the relevant python executable, please paste the output of:


python -c 'import monai; monai.config.print_debug_info()'

================================
Printing MONAI config...
================================
MONAI version: 1.1.0
Numpy version: 1.23.5
Pytorch version: 2.0.0+cu118
MONAI flags: HAS_EXT = False, USE_COMPILED = False, USE_META_DICT = False
MONAI rev id: a2ec3752f54bfc3b40e7952234fbeb5452ed63e3
MONAI __file__: C:\Users\keyur\.conda\envs\monai\lib\site-packages\monai\__init__.py

Optional dependencies:
Pytorch Ignite version: 0.4.10
Nibabel version: 5.0.1
scikit-image version: 0.20.0
Pillow version: 9.3.0
Tensorboard version: 2.12.0
gdown version: 4.6.4
TorchVision version: 0.15.1+cpu
tqdm version: 4.65.0
lmdb version: 1.4.0
psutil version: 5.9.0
pandas version: 1.5.3
einops version: 0.6.0
transformers version: NOT INSTALLED or UNKNOWN VERSION.
mlflow version: 2.2.2
pynrrd version: 0.4.3

For details about installing the optional dependencies, please visit:
    https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies

================================
Printing system config...
================================
System: Windows
Win32 version: ('10', '10.0.22621', 'SP0', 'Multiprocessor Free')
Win32 edition: Core
Platform: Windows-10-10.0.22621-SP0
Processor: Intel64 Family 6 Model 186 Stepping 2, GenuineIntel
Machine: AMD64
Python version: 3.9.16
Process name: python.exe
Command: ['C:/Users/keyur/.conda/envs/monai\\python.exe', '-m', 'ipykernel_launcher', '-f', 'C:\\Users\\keyur\\AppData\\Roaming\\jupyter\\runtime\\kernel-16ea1610-8b90-4286-bcdb-5d8bc9d19305.json']
Open files: [popenfile(path='C:\\Users\\keyur\\.ipython\\profile_default\\history.sqlite', fd=-1), popenfile(path='C:\\Program Files\\WindowsApps\\Microsoft.LanguageExperiencePacken-GB_22621.13.87.0_neutral__8wekyb3d8bbwe\\Windows\\System32\\en-GB\\2d99171d54bafb1068cad8303bddb437\\tzres.dll.mui', fd=-1), popenfile(path='C:\\Windows\\System32\\en-US\\kernel32.dll.mui', fd=-1), popenfile(path='C:\\Windows\\System32\\DriverStore\\FileRepository\\nvsmui.inf_amd64_1e558733305022a1\\nvcubins.bin', fd=-1), popenfile(path='C:\\Windows\\System32\\en-US\\KernelBase.dll.mui', fd=-1)]
Num physical CPUs: 14
Num logical CPUs: 20
Num usable CPUs: 20
CPU usage (%): [0.7, 0.4, 1.0, 0.1, 5.1, 1.1, 3.7, 1.2, 1.8, 0.4, 0.9, 0.4, 11.4, 14.7, 9.1, 10.1, 10.1, 10.6, 11.7, 12.5]
CPU freq. (MHz): 2600
Load avg. in last 1, 5, 15 mins (%): [13.8, 13.8, 10.0]
Disk usage (%): 44.1
Avg. sensor temp. (Celsius): UNKNOWN for given OS
Total physical memory (GB): 31.6
Available memory (GB): 18.1
Used memory (GB): 13.5

================================
Printing GPU config...
================================
Num GPUs: 1
Has CUDA: True
CUDA version: 11.8
cuDNN enabled: True
cuDNN version: 8700
Current device: 0
Library compiled for CUDA architectures: ['sm_37', 'sm_50', 'sm_60', 'sm_61', 'sm_70', 'sm_75', 'sm_80', 'sm_86', 'sm_90', 'compute_37']
GPU 0 Name: NVIDIA GeForce RTX 4070 Laptop GPU
GPU 0 Is integrated: False
GPU 0 Is multi GPU board: False
GPU 0 Multi processor count: 36
GPU 0 Total memory (GB): 8.0
GPU 0 CUDA capability (maj.min): 8.9

**Additional context**
Add any other context about the problem here.
diazandr3s commented 1 year ago

Hi @keyurradia,

Thanks for the detailed logs. From the logs, I see you have an 8GB GPU. It seems you don't have enough GPU memory to train the Segmentation model. Maybe changing to Dataset loader instead of SmartCacheDataset could help?

[2023-03-22 11:57:22,989] [32928] [MainThread] [INFO] (monailabel.tasks.train.basic_train:489) - 0 - Train Request (final): {'name': 'train_01', 'pretrained': True, 'device': 'cuda', 'max_epochs': 50, 'early_stop_patience': -1, 'val_split': 0.2, 'train_batch_size': 1, 'val_batch_size': 1, 'multi_gpu': False, 'gpus': 'all', 'dataset': 'SmartCacheDataset', 'dataloader': 'ThreadDataLoader', 'tracking': 'mlflow', 'tracking_uri': '', 'tracking_experiment_name': '', 'model': 'segmentation', 'client_id': 'user-xyz', 'local_rank': 0, 'run_id': '20230322_115722'}

You can change this in the Options tab in 3DSlicer. It is not for certain (it depends on the CTs size), but at least you can try.

Another option is to try a bigger GPU.

Hope that helps,

keyurradia commented 1 year ago

Hei @diazandr3s

Thank you for your quick answer. I have tried to change smartcachedataset to dataset. Unfortunately it did not work.
Before I got to update to cuda 11.8 it dd work on CPU. Do you think is it feasible to train on CPU?

Thanks Keyur

diazandr3s commented 1 year ago

For a better user experience, we recommend using MONAI Label on a GPU-based PC: https://github.com/Project-MONAI/MONAILabel#installation

PathSally commented 1 year ago

πŸ˜‚I also got this error when running the lung nodule model: <monai.transforms.compose.Compose object at 0x7f71c8238d00>.

keyurradia commented 1 year ago

I have the GPU with 8gb memory, are you using some stronger GPU @PathSally ?

PathSally commented 1 year ago

Keyur Radiya @.***>于2023εΉ΄3月23ζ—₯ ε‘¨ε››δΈ‹εˆ7:28ε†™ι“οΌš

I have the GPU with 8gb memory, are you using some stronger GPU @PathSally https://github.com/PathSally ?

β€” Reply to this email directly, view it on GitHub https://github.com/Project-MONAI/MONAILabel/issues/1351#issuecomment-1481022259, or unsubscribe https://github.com/notifications/unsubscribe-auth/A5RRGMRZ64NPLX2ZKAZSZRDW5QXVJANCNFSM6AAAAAAWDV4JEY . You are receiving this because you were mentioned.Message ID: @.***>

I also have the same 8gb memory. But when I reduced the batch_size, there were no memory errors.

keyurradia commented 1 year ago

@PathSally Wow ! Thats great. How was it possible to reduce batch size. I did try from the Slicer (in the option) but it did not allow me to do so. Is there anywhere else can I also reduce the batch size?

PathSally commented 1 year ago

Keyur Radiya @.***>于2023εΉ΄3月23ζ—₯ ε‘¨ε››δΈ‹εˆ7:37ε†™ι“οΌš

@PathSally https://github.com/PathSally Wow ! Thats great. How was it possible to reduce batch size. I did try from the Slicer (in the option) but it did not allow me to do so. Is there anywhere else can I also reduce the batch size?

β€” Reply to this email directly, view it on GitHub https://github.com/Project-MONAI/MONAILabel/issues/1351#issuecomment-1481034250, or unsubscribe https://github.com/notifications/unsubscribe-auth/A5RRGMTA7O4W2FK4HTPPANTW5QYX3ANCNFSM6AAAAAAWDV4JEY . You are receiving this because you were mentioned.Message ID: @.***>

I modified batch_size in the monai model file (train.json). I just tried 24gb memory, still RuntimeErroe.

SachidanandAlle commented 1 year ago

try to run the training without monailabel.. if you have downloaded the bundle.. you can directly use the bundle to train over your dataset.