Hello,
I am trying to train the network myself on the GPU just to test if I can recreate everything. However I encountered a problem. I got two GPUs in my machine. The one with ID 0 and about 8 GB of VRAM and a good one (ID 1) for computing with about 64gb or VRAM. Now the problem is that if I go into the config file to adjust this to
device:
use_gpu: True
gpu_ids: '1'
num_workers: 2
I get a message, that the VRAM of the corresponding device is full and the training is aborted. Changing the ID to 0 works, but takes ages (like 3 days for the object detection, another 4-5 days for the mesh generation and the joint training is still running after 1.5 days at epoch 80/400).
Can someone tell me my mistake and what I can do to actually train on the correct GPU (as stated above, I already tried to set the ID to 1, but that doesnt work)?
Thank you very much in advance.
And just to clarify: The pretrained model works absolutely fine.
Hello, I am trying to train the network myself on the GPU just to test if I can recreate everything. However I encountered a problem. I got two GPUs in my machine. The one with ID 0 and about 8 GB of VRAM and a good one (ID 1) for computing with about 64gb or VRAM. Now the problem is that if I go into the config file to adjust this to
device: use_gpu: True gpu_ids: '1' num_workers: 2
I get a message, that the VRAM of the corresponding device is full and the training is aborted. Changing the ID to 0 works, but takes ages (like 3 days for the object detection, another 4-5 days for the mesh generation and the joint training is still running after 1.5 days at epoch 80/400).
Can someone tell me my mistake and what I can do to actually train on the correct GPU (as stated above, I already tried to set the ID to 1, but that doesnt work)?
Thank you very much in advance.
And just to clarify: The pretrained model works absolutely fine.