GAP-LAB-CUHK-SZ / Total3DUnderstanding

Implementation of CVPR'20 Oral: Total3DUnderstanding: Joint Layout, Object Pose and Mesh Reconstruction for Indoor Scenes from a Single Image
MIT License
415 stars 50 forks source link

Training on the GPU does not correctly work? #22

Open LEGoebel opened 3 years ago

LEGoebel commented 3 years ago

Hello, I am trying to train the network myself on the GPU just to test if I can recreate everything. However I encountered a problem. I got two GPUs in my machine. The one with ID 0 and about 8 GB of VRAM and a good one (ID 1) for computing with about 64gb or VRAM. Now the problem is that if I go into the config file to adjust this to

device: use_gpu: True gpu_ids: '1' num_workers: 2

I get a message, that the VRAM of the corresponding device is full and the training is aborted. Changing the ID to 0 works, but takes ages (like 3 days for the object detection, another 4-5 days for the mesh generation and the joint training is still running after 1.5 days at epoch 80/400).

Can someone tell me my mistake and what I can do to actually train on the correct GPU (as stated above, I already tried to set the ID to 1, but that doesnt work)?

Thank you very much in advance.

And just to clarify: The pretrained model works absolutely fine.