Open keyurradia opened 1 year ago
Hi @keyurradia,
Thanks for the detailed logs. From the logs, I see you have an 8GB GPU. It seems you don't have enough GPU memory to train the Segmentation model. Maybe changing to Dataset loader instead of SmartCacheDataset could help?
[2023-03-22 11:57:22,989] [32928] [MainThread] [INFO] (monailabel.tasks.train.basic_train:489) - 0 - Train Request (final): {'name': 'train_01', 'pretrained': True, 'device': 'cuda', 'max_epochs': 50, 'early_stop_patience': -1, 'val_split': 0.2, 'train_batch_size': 1, 'val_batch_size': 1, 'multi_gpu': False, 'gpus': 'all', 'dataset': 'SmartCacheDataset', 'dataloader': 'ThreadDataLoader', 'tracking': 'mlflow', 'tracking_uri': '', 'tracking_experiment_name': '', 'model': 'segmentation', 'client_id': 'user-xyz', 'local_rank': 0, 'run_id': '20230322_115722'}
You can change this in the Options tab in 3DSlicer. It is not for certain (it depends on the CTs size), but at least you can try.
Another option is to try a bigger GPU.
Hope that helps,
Hei @diazandr3s
Thank you for your quick answer. I have tried to change smartcachedataset to dataset. Unfortunately it did not work.
Before I got to update to cuda 11.8 it dd work on CPU. Do you think is it feasible to train on CPU?
Thanks Keyur
For a better user experience, we recommend using MONAI Label on a GPU-based PC: https://github.com/Project-MONAI/MONAILabel#installation
πI also got this error when running the lung nodule model: <monai.transforms.compose.Compose object at 0x7f71c8238d00>.
I have the GPU with 8gb memory, are you using some stronger GPU @PathSally ?
Keyur Radiya @.***>δΊ2023εΉ΄3ζ23ζ₯ ε¨εδΈε7:28ειοΌ
I have the GPU with 8gb memory, are you using some stronger GPU @PathSally https://github.com/PathSally ?
β Reply to this email directly, view it on GitHub https://github.com/Project-MONAI/MONAILabel/issues/1351#issuecomment-1481022259, or unsubscribe https://github.com/notifications/unsubscribe-auth/A5RRGMRZ64NPLX2ZKAZSZRDW5QXVJANCNFSM6AAAAAAWDV4JEY . You are receiving this because you were mentioned.Message ID: @.***>
I also have the same 8gb memory. But when I reduced the batch_size, there were no memory errors.
@PathSally Wow ! Thats great. How was it possible to reduce batch size. I did try from the Slicer (in the option) but it did not allow me to do so. Is there anywhere else can I also reduce the batch size?
Keyur Radiya @.***>δΊ2023εΉ΄3ζ23ζ₯ ε¨εδΈε7:37ειοΌ
@PathSally https://github.com/PathSally Wow ! Thats great. How was it possible to reduce batch size. I did try from the Slicer (in the option) but it did not allow me to do so. Is there anywhere else can I also reduce the batch size?
β Reply to this email directly, view it on GitHub https://github.com/Project-MONAI/MONAILabel/issues/1351#issuecomment-1481034250, or unsubscribe https://github.com/notifications/unsubscribe-auth/A5RRGMTA7O4W2FK4HTPPANTW5QYX3ANCNFSM6AAAAAAWDV4JEY . You are receiving this because you were mentioned.Message ID: @.***>
I modified batch_size in the monai model file (train.json). I just tried 24gb memory, still RuntimeErroe.
try to run the training without monailabel.. if you have downloaded the bundle.. you can directly use the bundle to train over your dataset.
Describe the bug Before I got to update my cudatoolkit 11.8 MonaiLabel was not recognizing cudatoolkit 11.3 and cuda was disabling. The training process was ran in CPU mode and was running fine. After I got to update cudatoolkit 11.8 and then cuda is not getting disabled but I am getting - RuntimeError: applying transform <monai.transforms.compose.Compose object at 0x0000014188C0B160>.
When I go through then I found the **_"torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.69 GiB (GPU 0; 8.00 GiB total capacity; 6.14 GiB already allocated; 0 bytes free; 6.22 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Server logs [2023-03-22 11:57:12,660] [7720] [MainThread] [INFO] (monailabel.endpoints.datastore:68) - Image: 23.01.03.18; File: <starlette.datastructures.UploadFile object at 0x000001C073531D90>; params: {"client_id": "user-xyz"} [2023-03-22 11:57:12,746] [7720] [MainThread] [INFO] (monailabel.datastore.local:439) - Adding Image: 23.01.03.18 => C:\Users\keyur\AppData\Local\Temp\tmp1n82rfbc.nii.gz [2023-03-22 11:57:13,325] [7720] [MainThread] [INFO] (monailabel.endpoints.datastore:101) - Saving Label for 23.01.03.18 for tag: final by admin [2023-03-22 11:57:13,331] [7720] [MainThread] [INFO] (monailabel.endpoints.datastore:112) - Save Label params: {"label_info": [{"name": "liver", "idx": 1}, {"name": "venaporta", "idx": 2}, {"name": "livervein", "idx": 3}, {"name": "venacava", "idx": 4}, {"name": "lesions", "idx": 5}], "client_id": "user-xyz"} [2023-03-22 11:57:13,332] [7720] [MainThread] [INFO] (monailabel.datastore.local:486) - Saving Label for Image: 23.01.03.18; Tag: final; Info: {'label_info': [{'name': 'liver', 'idx': 1}, {'name': 'venaporta', 'idx': 2}, {'name': 'livervein', 'idx': 3}, {'name': 'venacava', 'idx': 4}, {'name': 'lesions', 'idx': 5}], 'client_id': 'user-xyz'} [2023-03-22 11:57:13,333] [7720] [MainThread] [INFO] (monailabel.datastore.local:494) - Adding Label: 23.01.03.18 => final => C:\Users\keyur\AppData\Local\Temp\tmpm17x2cn6.nii.gz [2023-03-22 11:57:13,338] [7720] [MainThread] [INFO] (monailabel.datastore.local:510) - Label Info: {'label_info': [{'name': 'liver', 'idx': 1}, {'name': 'venaporta', 'idx': 2}, {'name': 'livervein', 'idx': 3}, {'name': 'venacava', 'idx': 4}, {'name': 'lesions', 'idx': 5}], 'client_id': 'user-xyz', 'ts': 1679482633, 'name': '23.01.03.18.nii.gz'} [2023-03-22 11:57:13,344] [7720] [MainThread] [INFO] (monailabel.interfaces.app:492) - New label saved for: 23.01.03.18 => 23.01.03.18 [2023-03-22 11:57:16,062] [7720] [MainThread] [INFO] (monailabel.utils.async_tasks.task:41) - Train request: {'model': 'segmentation', 'name': 'train_01', 'pretrained': True, 'device': 'cuda', 'max_epochs': 50, 'early_stop_patience': -1, 'val_split': 0.2, 'train_batch_size': 1, 'val_batch_size': 1, 'multi_gpu': True, 'gpus': 'all', 'dataset': 'SmartCacheDataset', 'dataloader': 'ThreadDataLoader', 'tracking': 'mlflow', 'tracking_uri': '', 'tracking_experiment_name': '', 'client_id': 'user-xyz'} [2023-03-22 11:57:16,063] [7720] [ThreadPoolExecutor-2_0] [INFO] (monailabel.utils.async_tasks.utils:49) - Before:: C:\Users\keyur\MONAILabel; [2023-03-22 11:57:16,064] [7720] [ThreadPoolExecutor-2_0] [INFO] (monailabel.utils.async_tasks.utils:53) - After:: C:\Users\keyur\MONAILabel; [2023-03-22 11:57:16,065] [7720] [ThreadPoolExecutor-2_0] [INFO] (monailabel.utils.async_tasks.utils:65) - COMMAND:: C:\Users\keyur.conda\envs\monai\python.exe -m monailabel.interfaces.utils.app -m train -r {"model":"segmentation","name":"train_01","pretrained":true,"device":"cuda","max_epochs":50,"early_stop_patience":-1,"val_split":0.2,"train_batch_size":1,"val_batch_size":1,"multi_gpu":true,"gpus":"all","dataset":"SmartCacheDataset","dataloader":"ThreadDataLoader","tracking":"mlflow","tracking_uri":"","tracking_experiment_name":"","client_id":"user-xyz"} [2023-03-22 11:57:17,250] [32928] [MainThread] [INFO] (main:37) - Initializing App from: C:\Users\keyur\MONAILabel\monailabel\scripts\apps\radiology; studies: C:\Users\keyur\MONAILabel\monailabel\scripts\datasets\training; conf: {'models': 'segmentation'} [2023-03-22 11:57:22,938] [32928] [MainThread] [INFO] (monailabel.utils.others.class_utils:57) - Subclass for MONAILabelApp Found: <class 'main.MyApp'> [2023-03-22 11:57:22,947] [32928] [MainThread] [INFO] (monailabel.utils.others.class_utils:57) - Subclass for TaskConfig Found: <class 'lib.configs.deepedit.DeepEdit'> [2023-03-22 11:57:22,948] [32928] [MainThread] [INFO] (monailabel.utils.others.class_utils:57) - Subclass for TaskConfig Found: <class 'lib.configs.deepgrow_2d.Deepgrow2D'> [2023-03-22 11:57:22,948] [32928] [MainThread] [INFO] (monailabel.utils.others.class_utils:57) - Subclass for TaskConfig Found: <class 'lib.configs.deepgrow_3d.Deepgrow3D'> [2023-03-22 11:57:22,949] [32928] [MainThread] [INFO] (monailabel.utils.others.class_utils:57) - Subclass for TaskConfig Found: <class 'lib.configs.localization_spine.LocalizationSpine'> [2023-03-22 11:57:22,949] [32928] [MainThread] [INFO] (monailabel.utils.others.class_utils:57) - Subclass for TaskConfig Found: <class 'lib.configs.localization_vertebra.LocalizationVertebra'> [2023-03-22 11:57:22,950] [32928] [MainThread] [INFO] (monailabel.utils.others.class_utils:57) - Subclass for TaskConfig Found: <class 'lib.configs.segmentation.Segmentation'> [2023-03-22 11:57:22,950] [32928] [MainThread] [INFO] (monailabel.utils.others.class_utils:57) - Subclass for TaskConfig Found: <class 'lib.configs.segmentation_spleen.SegmentationSpleen'> [2023-03-22 11:57:22,951] [32928] [MainThread] [INFO] (monailabel.utils.others.class_utils:57) - Subclass for TaskConfig Found: <class 'lib.configs.segmentation_vertebra.SegmentationVertebra'> [2023-03-22 11:57:22,951] [32928] [MainThread] [INFO] (main:93) - +++ Adding Model: segmentation => lib.configs.segmentation.Segmentation [2023-03-22 11:57:22,974] [32928] [MainThread] [INFO] (main:96) - +++ Using Models: ['segmentation'] [2023-03-22 11:57:22,974] [32928] [MainThread] [INFO] (monailabel.interfaces.app:134) - Init Datastore for: C:\Users\keyur\MONAILabel\monailabel\scripts\datasets\training [2023-03-22 11:57:22,975] [32928] [MainThread] [INFO] (monailabel.datastore.local:130) - Auto Reload: False; Extensions: ['.nii.gz', '.nii', '.nrrd', '.jpg', '.png', '.tif', '.svs', '.xml'] [2023-03-22 11:57:22,986] [32928] [MainThread] [INFO] (monailabel.datastore.local:577) - Invalidate count: 0 [2023-03-22 11:57:22,987] [32928] [MainThread] [INFO] (main:126) - +++ Adding Inferer:: segmentation => <lib.infers.segmentation.Segmentation object at 0x00000141821A56D0> [2023-03-22 11:57:22,987] [32928] [MainThread] [INFO] (main:191) - {'segmentation': <lib.infers.segmentation.Segmentation object at 0x00000141821A56D0>, 'Histogram+GraphCut': <monailabel.scribbles.infer.HistogramBasedGraphCut object at 0x000001418AA7F370>, 'GMM+GraphCut': <monailabel.scribbles.infer.GMMBasedGraphCut object at 0x000001418AA7F340>} [2023-03-22 11:57:22,987] [32928] [MainThread] [INFO] (main:206) - +++ Adding Trainer:: segmentation => <lib.trainers.segmentation.Segmentation object at 0x000001418AA7F3A0> [2023-03-22 11:57:22,987] [32928] [MainThread] [INFO] (monailabel.utils.sessions:51) - Session Path: C:\Users\keyur.cache\monailabel\sessions [2023-03-22 11:57:22,987] [32928] [MainThread] [INFO] (monailabel.utils.sessions:52) - Session Expiry (max): 3600 [2023-03-22 11:57:22,987] [32928] [MainThread] [INFO] (monailabel.tasks.train.basic_train:432) - Train Request (input): {'model': 'segmentation', 'name': 'train_01', 'pretrained': True, 'device': 'cuda', 'max_epochs': 50, 'early_stop_patience': -1, 'val_split': 0.2, 'train_batch_size': 1, 'val_batch_size': 1, 'multi_gpu': True, 'gpus': 'all', 'dataset': 'SmartCacheDataset', 'dataloader': 'ThreadDataLoader', 'tracking': 'mlflow', 'tracking_uri': '', 'tracking_experiment_name': '', 'client_id': 'user-xyz', 'local_rank': 0} [2023-03-22 11:57:22,987] [32928] [MainThread] [INFO] (monailabel.tasks.train.basic_train:442) - CUDA_VISIBLE_DEVICES: None [2023-03-22 11:57:22,989] [32928] [MainThread] [INFO] (monailabel.tasks.train.basic_train:447) - Distributed/Multi GPU is limited [2023-03-22 11:57:22,989] [32928] [MainThread] [INFO] (monailabel.tasks.train.basic_train:462) - Distributed Training = FALSE [2023-03-22 11:57:22,989] [32928] [MainThread] [INFO] (monailabel.tasks.train.basic_train:489) - 0 - Train Request (final): {'name': 'train_01', 'pretrained': True, 'device': 'cuda', 'max_epochs': 50, 'early_stop_patience': -1, 'val_split': 0.2, 'train_batch_size': 1, 'val_batch_size': 1, 'multi_gpu': False, 'gpus': 'all', 'dataset': 'SmartCacheDataset', 'dataloader': 'ThreadDataLoader', 'tracking': 'mlflow', 'tracking_uri': '', 'tracking_experiment_name': '', 'model': 'segmentation', 'client_id': 'user-xyz', 'local_rank': 0, 'run_id': '20230322_115722'} [2023-03-22 11:57:22,989] [32928] [MainThread] [INFO] (monailabel.tasks.train.basic_train:622) - 0 - Using Device: cuda; IDX: None [2023-03-22 11:57:22,989] [32928] [MainThread] [INFO] (monailabel.tasks.train.basic_train:515) - Run/Output Path: C:\Users\keyur\MONAILabel\monailabel\scripts\apps\radiology\model\segmentation\train_01 [2023-03-22 11:57:22,989] [32928] [MainThread] [INFO] (monailabel.tasks.train.basic_train:531) - Tracking: mlflow [2023-03-22 11:57:22,989] [32928] [MainThread] [INFO] (monailabel.tasks.train.basic_train:532) - Tracking URI: file:///C:/Users/keyur/MONAILabel/monailabel/scripts/apps/radiology/model/segmentation/train_01/mlruns; [2023-03-22 11:57:22,989] [32928] [MainThread] [INFO] (monailabel.tasks.train.basic_train:533) - Tracking Experiment Name: segmentation; Run Name: run_20230322_115722 [2023-03-22 11:57:22,989] [32928] [MainThread] [INFO] (monailabel.tasks.train.basic_train:410) - Total Records for Training: 6 [2023-03-22 11:57:22,989] [32928] [MainThread] [INFO] (monailabel.tasks.train.basic_train:411) - Total Records for Validation: 2 Loading dataset: 0%| | 0/2 [00:00<?, ?it/s] Loading dataset: 50%|##### | 1/2 [00:11<00:11, 11.37s/it] Loading dataset: 100%|##########| 2/2 [00:21<00:00, 10.42s/it] Loading dataset: 100%|##########| 2/2 [00:21<00:00, 10.57s/it] cache_num is greater or equal than dataset length, fall back to regular monai.data.CacheDataset. [2023-03-22 11:57:44,226] [32928] [MainThread] [INFO] (monailabel.tasks.train.basic_train:328) - 0 - Records for Validation: 2 [2023-03-22 11:57:44,237] [32928] [MainThread] [INFO] (monailabel.tasks.train.basic_train:318) - 0 - Adding Validation to run every '1' interval [2023-03-22 11:57:44,240] [32928] [MainThread] [INFO] (monailabel.tasks.train.basic_train:710) - 0 - Load Path C:\Users\keyur\MONAILabel\monailabel\scripts\apps\radiology\model\segmentation\train_01\model.pt Loading dataset: 0%| | 0/6 [00:00<?, ?it/s] Loading dataset: 17%|#6 | 1/6 [00:10<00:52, 10.54s/it] Loading dataset: 33%|###3 | 2/6 [00:15<00:28, 7.00s/it] Loading dataset: 50%|##### | 3/6 [00:26<00:27, 9.16s/it] Loading dataset: 67%|######6 | 4/6 [00:35<00:17, 8.80s/it] Loading dataset: 83%|########3 | 5/6 [00:47<00:10, 10.14s/it] Loading dataset: 100%|##########| 6/6 [01:02<00:00, 11.67s/it] Loading dataset: 100%|##########| 6/6 [01:02<00:00, 10.37s/it] [2023-03-22 11:58:46,454] [32928] [MainThread] [INFO] (monailabel.tasks.train.basic_train:264) - 0 - Records for Training: 6 [2023-03-22 11:58:46,458] [32928] [MainThread] [INFO] (ignite.engine.engine.SupervisedTrainer:876) - Engine run resuming from iteration 0, epoch 0 until 50 epochs [2023-03-22 11:58:46,617] [32928] [MainThread] [INFO] (ignite.engine.engine.SupervisedTrainer:138) - Restored all variables from C:\Users\keyur\MONAILabel\monailabel\scripts\apps\radiology\model\segmentation\train_01\model.pt [2023-03-22 11:58:51,634] [32928] [MainThread] [INFO] (ignite.engine.engine.SupervisedTrainer:272) - Epoch: 1/50, Iter: 1/6 -- train_loss: 0.9931 [2023-03-22 11:58:52,005] [32928] [MainThread] [INFO] (ignite.engine.engine.SupervisedTrainer:272) - Epoch: 1/50, Iter: 2/6 -- train_loss: 0.9202 [2023-03-22 11:58:52,382] [32928] [MainThread] [INFO] (ignite.engine.engine.SupervisedTrainer:272) - Epoch: 1/50, Iter: 3/6 -- train_loss: 0.8346 [2023-03-22 11:58:52,736] [32928] [MainThread] [INFO] (ignite.engine.engine.SupervisedTrainer:272) - Epoch: 1/50, Iter: 4/6 -- train_loss: 0.8939 [2023-03-22 11:58:53,133] [32928] [MainThread] [INFO] (ignite.engine.engine.SupervisedTrainer:272) - Epoch: 1/50, Iter: 5/6 -- train_loss: 0.9609 [2023-03-22 11:58:53,435] [32928] [MainThread] [INFO] (ignite.engine.engine.SupervisedTrainer:272) - Epoch: 1/50, Iter: 6/6 -- train_loss: 0.8174 [2023-03-22 11:58:53,442] [32928] [MainThread] [INFO] (ignite.engine.engine.SupervisedTrainer:257) - Got new best metric of train_mean_dice: 0.2552022635936737 [2023-03-22 11:58:53,442] [32928] [MainThread] [INFO] (ignite.engine.engine.SupervisedTrainer:201) - Epoch[1] Metrics -- train_lesions_mean_dice: 0.0045 train_liver_mean_dice: 0.6504 train_livervein_mean_dice: 0.3602 train_mean_dice: 0.2552 train_venacava_mean_dice: 0.0001 train_venaporta_mean_dice: 0.2535 [2023-03-22 11:58:53,442] [32928] [MainThread] [INFO] (ignite.engine.engine.SupervisedTrainer:212) - Key metric: train_mean_dice best value: 0.2552022635936737 at epoch: 1 [2023-03-22 11:58:53,448] [32928] [MainThread] [INFO] (ignite.engine.engine.SupervisedEvaluator:876) - Engine run resuming from iteration 0, epoch 0 until 1 epochs
[2023-03-22 11:58:57,944] [32928] [MainThread] [ERROR] (ignite.engine.engine.SupervisedEvaluator:1086) - Current run is terminating due to exception: applying transform <monai.transforms.compose.Compose object at 0x0000014188C0B160>
[2023-03-22 11:58:57,945] [32928] [MainThread] [ERROR] (ignite.engine.engine.SupervisedEvaluator:180) - Exception: applying transform <monai.transforms.compose.Compose object at 0x0000014188C0B160> Traceback (most recent call last): File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 102, in apply_transform return _apply_transform(transform, data, unpack_items) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 66, in _apply_transform return transform(parameters) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\post\dictionary.py", line 202, in call d[key] = self.converter(d[key], argmax, to_onehot, threshold, rounding) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\post\array.py", line 220, in call img_t = one_hot( File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\networks\utils.py", line 158, in one_hot o = torch.zeros(size=sh, dtype=dtype, device=labels.device)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.69 GiB (GPU 0; 8.00 GiB total capacity; 6.14 GiB already allocated; 0 bytes free; 6.22 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
The above exception was the direct cause of the following exception: Traceback (most recent call last): File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 102, in apply_transform return _apply_transform(transform, data, unpack_items) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 66, in _applytransform return transform(parameters) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\compose.py", line 174, in call input = apply_transform(transform, input, self.map_items, self.unpack_items, self.log_stats) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 129, in apply_transform raise RuntimeError(f"applying transform {transform}") from e RuntimeError: applying transform <monai.transforms.post.dictionary.AsDiscreted object at 0x0000014188BFE5B0> The above exception was the direct cause of the following exception: Traceback (most recent call last): File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 1068, in _run_once_on_dataset_as_gen self.state.output = self._process_function(self, self.state.batch) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\engines\evaluator.py", line 308, in _iteration engine.fire_event(IterationEvents.MODEL_COMPLETED) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 449, in fire_event return self._fire_event(event_name) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 425, in _fire_event func(first, (event_args + others), kwargs) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\engines\workflow.py", line 224, in _run_postprocessing engine.state.batch[i], engine.state.output[i] = engine_apply_transform(b, o, posttrans) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\engines\utils.py", line 258, in engine_apply_transform transformed_data = apply_transform(transform, data) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 129, in apply_transform raise RuntimeError(f"applying transform {transform}") from e RuntimeError: applying transform <monai.transforms.compose.Compose object at 0x0000014188C0B160> [2023-03-22 11:58:58,024] [32928] [MainThread] [ERROR] (ignite.engine.engine.SupervisedEvaluator:992) - Engine run is terminating due to exception: applying transform <monai.transforms.compose.Compose object at 0x0000014188C0B160> [2023-03-22 11:58:58,024] [32928] [MainThread] [ERROR] (ignite.engine.engine.SupervisedEvaluator:180) - Exception: applying transform <monai.transforms.compose.Compose object at 0x0000014188C0B160> Traceback (most recent call last): File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 102, in apply_transform return _apply_transform(transform, data, unpack_items) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 66, in _apply_transform return transform(parameters) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\post\dictionary.py", line 202, in call d[key] = self.converter(d[key], argmax, to_onehot, threshold, rounding) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\post\array.py", line 220, in call img_t = one_hot( File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\networks\utils.py", line 158, in one_hot o = torch.zeros(size=sh, dtype=dtype, device=labels.device) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.69 GiB (GPU 0; 8.00 GiB total capacity; 6.14 GiB already allocated; 0 bytes free; 6.22 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF The above exception was the direct cause of the following exception: Traceback (most recent call last): File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 102, in apply_transform return _apply_transform(transform, data, unpack_items) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 66, in _applytransform return transform(parameters) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\compose.py", line 174, in call input = apply_transform(transform, input, self.map_items, self.unpack_items, self.log_stats) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 129, in apply_transform raise RuntimeError(f"applying transform {transform}") from e RuntimeError: applying transform <monai.transforms.post.dictionary.AsDiscreted object at 0x0000014188BFE5B0> The above exception was the direct cause of the following exception: Traceback (most recent call last): File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 959, in _internal_run_as_gen epoch_time_taken += yield from self._run_once_on_dataset_as_gen() File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 1087, in _run_once_on_dataset_as_gen self._handle_exception(e) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 636, in _handle_exception self._fire_event(Events.EXCEPTION_RAISED, e) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 425, in _fire_event func(first, (event_args + others), kwargs) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\handlers\stats_handler.py", line 181, in exception_raised raise e File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 1068, in _run_once_on_dataset_as_gen self.state.output = self._process_function(self, self.state.batch) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\engines\evaluator.py", line 308, in _iteration engine.fire_event(IterationEvents.MODEL_COMPLETED) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 449, in fire_event return self._fire_event(event_name) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 425, in _fire_event func(first, (event_args + others), kwargs) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\engines\workflow.py", line 224, in _run_postprocessing engine.state.batch[i], engine.state.output[i] = engine_apply_transform(b, o, posttrans) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\engines\utils.py", line 258, in engine_apply_transform transformed_data = apply_transform(transform, data) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 129, in apply_transform raise RuntimeError(f"applying transform {transform}") from e RuntimeError: applying transform <monai.transforms.compose.Compose object at 0x0000014188C0B160> [2023-03-22 11:58:58,027] [32928] [MainThread] [ERROR] (ignite.engine.engine.SupervisedTrainer:992) - Engine run is terminating due to exception: applying transform <monai.transforms.compose.Compose object at 0x0000014188C0B160> [2023-03-22 11:58:58,027] [32928] [MainThread] [ERROR] (ignite.engine.engine.SupervisedTrainer:180) - Exception: applying transform <monai.transforms.compose.Compose object at 0x0000014188C0B160> Traceback (most recent call last): File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 102, in apply_transform return _apply_transform(transform, data, unpack_items) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 66, in _apply_transform return transform(parameters) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\post\dictionary.py", line 202, in call d[key] = self.converter(d[key], argmax, to_onehot, threshold, rounding) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\post\array.py", line 220, in call img_t = one_hot( File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\networks\utils.py", line 158, in one_hot o = torch.zeros(size=sh, dtype=dtype, device=labels.device) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.69 GiB (GPU 0; 8.00 GiB total capacity; 6.14 GiB already allocated; 0 bytes free; 6.22 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF The above exception was the direct cause of the following exception: Traceback (most recent call last): File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 102, in apply_transform return _apply_transform(transform, data, unpack_items) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 66, in _applytransform return transform(parameters) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\compose.py", line 174, in call input = apply_transform(transform, input, self.map_items, self.unpack_items, self.log_stats) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 129, in apply_transform raise RuntimeError(f"applying transform {transform}") from e RuntimeError: applying transform <monai.transforms.post.dictionary.AsDiscreted object at 0x0000014188BFE5B0> The above exception was the direct cause of the following exception: Traceback (most recent call last): File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 965, in _internal_run_as_gen self._fire_event(Events.EPOCH_COMPLETED) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 425, in _fire_event func(first, (event_args + others), kwargs) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\handlers\validation_handler.py", line 76, in call self.validator.run(engine.state.epoch) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\engines\evaluator.py", line 148, in run super().run() File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\engines\workflow.py", line 281, in run super().run(data=self.data_loader, max_epochs=self.state.max_epochs) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 892, in run return self._internal_run() File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 935, in _internal_run return next(self._internal_run_generator) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 993, in _internal_run_as_gen self._handle_exception(e) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 636, in _handle_exception self._fire_event(Events.EXCEPTION_RAISED, e) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 425, in _fire_event func(first, (event_args + others), kwargs) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\handlers\stats_handler.py", line 181, in exception_raised raise e File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 959, in _internal_run_as_gen epoch_time_taken += yield from self._run_once_on_dataset_as_gen() File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 1087, in _run_once_on_dataset_as_gen self._handle_exception(e) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 636, in _handle_exception self._fire_event(Events.EXCEPTION_RAISED, e) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 425, in _fire_event func(first, (event_args + others), kwargs) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\handlers\stats_handler.py", line 181, in exception_raised raise e File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 1068, in _run_once_on_dataset_as_gen self.state.output = self._process_function(self, self.state.batch) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\engines\evaluator.py", line 308, in _iteration engine.fire_event(IterationEvents.MODEL_COMPLETED) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 449, in fire_event return self._fire_event(event_name) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 425, in _fire_event func(first, (event_args + others), kwargs) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\engines\workflow.py", line 224, in _run_postprocessing engine.state.batch[i], engine.state.output[i] = engine_apply_transform(b, o, posttrans) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\engines\utils.py", line 258, in engine_apply_transform transformed_data = apply_transform(transform, data) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 129, in apply_transform raise RuntimeError(f"applying transform {transform}") from e RuntimeError: applying transform <monai.transforms.compose.Compose object at 0x0000014188C0B160> Traceback (most recent call last): File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 102, in apply_transform return _apply_transform(transform, data, unpack_items) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 66, in _apply_transform return transform(parameters) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\post\dictionary.py", line 202, in call d[key] = self.converter(d[key], argmax, to_onehot, threshold, rounding) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\post\array.py", line 220, in call img_t = one_hot( File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\networks\utils.py", line 158, in one_hot o = torch.zeros(size=sh, dtype=dtype, device=labels.device) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.69 GiB (GPU 0; 8.00 GiB total capacity; 6.14 GiB already allocated; 0 bytes free; 6.22 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF The above exception was the direct cause of the following exception: Traceback (most recent call last): File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 102, in apply_transform return _apply_transform(transform, data, unpack_items) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 66, in _applytransform return transform(parameters) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\compose.py", line 174, in call input = apply_transform(transform, input, self.map_items, self.unpack_items, self.log_stats) File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 129, in apply_transform raise RuntimeError(f"applying transform {transform}") from e RuntimeError: applying transform <monai.transforms.post.dictionary.AsDiscreted object at 0x0000014188BFE5B0> The above exception was the direct cause of the following exception: Traceback (most recent call last): File "C:\Users\keyur.conda\envs\monai\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\keyur.conda\envs\monai\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "C:\Users\keyur\MONAILabel\monailabel\interfaces\utils\app.py", line 128, in
run_main()
File "C:\Users\keyur\MONAILabel\monailabel\interfaces\utils\app.py", line 113, in run_main
result = a.train(request)
File "C:\Users\keyur\MONAILabel\monailabel\interfaces\app.py", line 422, in train
result = task(request, self.datastore())
File "C:\Users\keyur\MONAILabel\monailabel\tasks\train\basic_train.py", line 463, in call
res = self.train(0, world_size, req, datalist)
File "C:\Users\keyur\MONAILabel\monailabel\tasks\train\basic_train.py", line 552, in train
context.trainer.run()
File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\engines\trainer.py", line 53, in run
super().run()
File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\engines\workflow.py", line 281, in run
super().run(data=self.data_loader, max_epochs=self.state.max_epochs)
File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 892, in run
return self._internal_run()
File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 935, in _internal_run
return next(self._internal_run_generator)
File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 993, in _internal_run_as_gen
self._handle_exception(e)
File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 636, in _handle_exception
self._fire_event(Events.EXCEPTION_RAISED, e)
File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 425, in _fire_event
func(first, (event_args + others), kwargs)
File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\handlers\stats_handler.py", line 181, in exception_raised
raise e
File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 965, in _internal_run_as_gen
self._fire_event(Events.EPOCH_COMPLETED)
File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 425, in _fire_event
func(first, (event_args + others), kwargs)
File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\handlers\validation_handler.py", line 76, in call
self.validator.run(engine.state.epoch)
File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\engines\evaluator.py", line 148, in run
super().run()
File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\engines\workflow.py", line 281, in run
super().run(data=self.data_loader, max_epochs=self.state.max_epochs)
File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 892, in run
return self._internal_run()
File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 935, in _internal_run
return next(self._internal_run_generator)
File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 993, in _internal_run_as_gen
self._handle_exception(e)
File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 636, in _handle_exception
self._fire_event(Events.EXCEPTION_RAISED, e)
File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 425, in _fire_event
func(first, (event_args + others), kwargs)
File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\handlers\stats_handler.py", line 181, in exception_raised
raise e
File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 959, in _internal_run_as_gen
epoch_time_taken += yield from self._run_once_on_dataset_as_gen()
File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 1087, in _run_once_on_dataset_as_gen
self._handle_exception(e)
File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 636, in _handle_exception
self._fire_event(Events.EXCEPTION_RAISED, e)
File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 425, in _fire_event
func(first, (event_args + others), kwargs)
File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\handlers\stats_handler.py", line 181, in exception_raised
raise e
File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 1068, in _run_once_on_dataset_as_gen
self.state.output = self._process_function(self, self.state.batch)
File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\engines\evaluator.py", line 308, in _iteration
engine.fire_event(IterationEvents.MODEL_COMPLETED)
File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 449, in fire_event
return self._fire_event(event_name)
File "C:\Users\keyur.conda\envs\monai\lib\site-packages\ignite\engine\engine.py", line 425, in _fire_event
func(first, (event_args + others), kwargs)
File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\engines\workflow.py", line 224, in _run_postprocessing
engine.state.batch[i], engine.state.output[i] = engine_apply_transform(b, o, posttrans)
File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\engines\utils.py", line 258, in engine_apply_transform
transformed_data = apply_transform(transform, data)
File "C:\Users\keyur.conda\envs\monai\lib\site-packages\monai\transforms\transform.py", line 129, in apply_transform
raise RuntimeError(f"applying transform {transform}") from e
RuntimeError: applying transform <monai.transforms.compose.Compose object at 0x0000014188C0B160>
[2023-03-22 11:58:59,641] [7720] [ThreadPoolExecutor-2_0] [INFO] (monailabel.utils.async_tasks.utils:83) - Return code: 1
To Reproduce Steps to reproduce the behavior:
Expected behavior A clear and concise description of what you expected to happen.
Screenshots If applicable, add screenshots to help explain your problem.
Environment
Ensuring you use the relevant python executable, please paste the output of: