Closed Harsh188 closed 1 year ago
Hi @Harsh188 , The problem comes from the overflow of memory in the workers of the dataloader. I think you should reduce the number of workers.
Hi @anhquancao, thanks for the quick response.
I tried setting num_workers_per_gpu
to 1 and I'm still facing the issue. I'm rocking a RTX 3080 Founders edition (10GB VRAM)
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:2B:00.0 On | N/A |
| 30% 47C P5 77W / 320W | 641MiB / 10240MiB | 4% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
Here's the output for python -c 'import torch;print(torch.__version__);print(torch.version.cuda)'
1.7.1
10.2
How much memory is required to execute the pretrained model?
Hi, I mean the RAM not the GPU memory. Did you set batch_size to 1 also? The model need 32Gb GPU memory to train with batch_size of 1. For inference, it is probably much cheaper but I don't have the number.
Hi,what if I only have a single 3090 how can I run training sequnce
I think you can try the following:
For the RAM problem, you might need to optimize the data type of the variable.
Thanks @anhquancao. Closing this issue as it's mostly a hardware limitation on my end.
Description
Unable to run
train_monoscene.py
andeval_monoscene.py
. I can't seem to debug the issue as I'm unable to see any logging/print statement output on my console.Terminal Log:
The following log is for
train_monoscene.py
:The following log is for
eval_monoscene.py
: