Closed BSonya closed 1 year ago
looks like it's an issue of inverting the customised transform HecktorCropNeckRegion
, perhaps you can remove the transform in the inference preprocessing.
Thankyou for your response. . I am now trying to train on the complete Hecktor dataset (30GB) using a single 12GB GPU. However, the training process either terminates due to a VS code crash or takes an excessively long time (8-10 hours per epoch). I have attempted to address this issue by using an additional 12GB RAM module. This increased the training speed when num_workers = 4. However, it still resulted in crashes of VS code or the entire system. When num_workers = 0, the training progresses further, but it remains time-consuming.
Environment:
OS --> ubuntu 22.04.2
Python version --> 3.10.11
MONAI version --> 1.1.0
Pytorch --> 2.0.0+CU117
GPU models and configuration:
(hecktor22) dlrs@spml1:~$ nvidia-smi
Tue Jun 20 12:19:47 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.116.04 Driver Version: 525.116.04 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:03:00.0 On | N/A |
| 0% 41C P5 23W / 275W | 199MiB / 11264MiB | 1% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 1024 G /usr/lib/xorg/Xorg 90MiB | | 0 N/A N/A 1310 G /usr/bin/gnome-shell 29MiB | | 0 N/A N/A 1429 G ...mviewer/tv_bin/TeamViewer 21MiB | | 0 N/A N/A 6276 G gnome-control-center 2MiB | | 0 N/A N/A 6417 G ...754442497770707874,262144 52MiB | +-----------------------------------------------------------------------------+
CPU Details:
(hecktor22) dlrs@spml1:~$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 40 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) CPU E5530 @ 2.40GHz
CPU family: 6
Model: 26
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
Stepping: 5
BogoMIPS: 4788.22
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht tm
pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs
bts rep_good nopl xtopology nonstop_tsc cpuid aperfmper
f pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm
dca sse4_1 sse4_2 popcnt lahf_lm pti ssbd ibrs ibpb sti
bp dtherm ida flush_l1d
Caches (sum of all):
L1d: 128 KiB (4 instances)
L1i: 128 KiB (4 instances)
L2: 1 MiB (4 instances)
L3: 8 MiB (1 instance)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-3
Vulnerabilities:
Itlb multihit: KVM: Mitigation: VMX unsupported
L1tf: Mitigation; PTE Inversion
Mds: Vulnerable: Clear CPU buffers attempted, no microcode; S
MT disabled
Meltdown: Mitigation; PTI
Mmio stale data: Unknown: No mitigations
Retbleed: Not affected
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer
sanitization
Spectre v2: Mitigation; Retpolines, IBPB conditional, IBRS_FW, RSB f
illing, PBRSB-eIBRS Not affected
Srbds: Not affected
Tsx async abort: Not affected
..
sure, are you using this example? https://github.com/Project-MONAI/tutorials/tree/main/auto3dseg/tasks/hecktor22 the latest monai v1.2.0 had various optimizations in terms of training speed and memory usage, please consider upgrading to monai 1.2 pip install -U monai
Yes, I'm using the same example. I do have a different conda environment where I have installed monai 1.2. see the details below:
(hecktor22a) dlrs@spml1:~$ python -c "import monai; monai.config.print_config()" MONAI version: 1.2.0rc7+13.gf355d1fc Numpy version: 1.24.3 Pytorch version: 2.0.1+cu117 MONAI flags: HAS_EXT = False, USE_COMPILED = False, USE_META_DICT = False MONAI rev id: f355d1fc38c50e83e4f20ffe870c19a5814dc501 MONAI file: /home/dlrs/MONAI/monai/init.py
Optional dependencies: Pytorch Ignite version: 0.4.11 ITK version: 5.3.0 Nibabel version: 5.1.0 scikit-image version: 0.21.0 Pillow version: 9.5.0 Tensorboard version: 2.13.0 gdown version: 4.7.1 TorchVision version: 0.15.2+cu117 tqdm version: 4.65.0 lmdb version: 1.4.1 psutil version: 5.9.5 pandas version: 2.0.2 einops version: 0.6.1 transformers version: 4.21.3 mlflow version: 2.4.0 pynrrd version: 1.0.0
When I switch to this env, the following error appears:
(hecktor22a) dlrs@spml1:~/Desktop/tutorials/auto3dseg/tasks/hecktor22$ python hecktor22.py
2023-06-20 14:52:24,122 - INFO - AutoRunner using work directory ./work_dir
2023-06-20 14:52:24,126 - INFO - Loading input config input.yaml
2023-06-20 14:52:24,141 - INFO - Datalist was copied to work_dir: /home/dlrs/Desktop/tutorials/auto3dseg/tasks/hecktor22/work_dir/hecktor22_folds.json
2023-06-20 14:52:24,144 - INFO - Setting num_fold 5 based on the input datalist /home/dlrs/Desktop/tutorials/auto3dseg/tasks/hecktor22/work_dir/hecktor22_folds.json.
2023-06-20 14:52:24,269 - INFO - Skipping data analysis...
2023-06-20 14:52:24,269 - INFO - Skipping algorithm generation...
Traceback (most recent call last):
File "/home/dlrs/Desktop/tutorials/auto3dseg/tasks/hecktor22/hecktor22.py", line 26, in
@myron could you please help? it seems the tutorial here https://github.com/Project-MONAI/tutorials/tree/main/auto3dseg/tasks/hecktor22 should be updated for v1.2...
@myron do I need to increase RAM or something? Kindly help
@BSonya hi, hold on for 1-2 days, it's fixed, just needs to be updated on github
@BSonya so if you want to run it sooner, you can
1) do a clean start (remove the previously generated folders in /content/tutorials/auto3dseg/tasks/hecktor22/) 2) use updated tutorial configs folder https://github.com/Project-MONAI/tutorials/tree/main/auto3dseg/tasks/hecktor22 3) and you'll also need an updated algo template files , so clone/download this folder manually "algorithm_templates" https://github.com/Project-MONAI/research-contributions/tree/main/auto3dseg/algorithm_templates
and when training specify a path to it, e.g..
python -m monai.apps.auto3dseg AutoRunner run --input='./input.yaml' --algos='segresnet' --templates_path_or_url=/path/to/downloaded/algorithm_templates
Hi. I am trying to utilize the Hecktor tutorial for Head and Neck tumor segmentation on the Hecktor dataset. As a beginner, I decided to go with a small part of the dataset, just 5 pairs of CT and PET images initially (one in each training fold) for practice. I have also specified data for inference in datalist file. However, after training all 5 folds, I encountered the following error: