Closed deepak242424 closed 2 years ago
I haven't extensively tested the multi-GPU version since the speedup gain was marginal. I would recommend sticking with the single-GPU version, but I will try to fix this bug at some point.
I haven't extensively tested the multi-GPU version since the speedup gain was marginal. I would recommend sticking with the single-GPU version, but I will try to fix this bug at some point.
Thanks @drsrinathsridhar for your quick reply. And one last thing, can you please tell how much time it took to train your models? It took me around more than 3 days to train multiview model with 5 views on a single GPU till 100 epochs on Shapenetplain_v1 dataset. Do you think it is reasonable?
What GPU are you using? On an Nvidia V100, I think it's about 2 days for ShapeNetCOCO. @davrempe could give you an exact number.
What GPU are you using? On an Nvidia V100, I think it's about 2 days for ShapeNetCOCO. @davrempe could give you an exact number.
I am using NVIDIA 1080Ti. And you ran for 100 epochs with batch_size=1 (as mentioned in the paper), right?
Yes, we ran 100 epochs with batch size of 1 on a single V100. For 5 views, the most time-consuming category is chairs which took about 2 days for ShapeNetCOCO. Cars and planes took a little over 1 day each. Considering the differences between the 1080Ti and V100, 3 days does not seem unreasonable.
Hi, when i tried to do the multi view training on multi-GPU. I got the same error AttributeError: 'DataParallel' object has no attribute 'conv1' But when i tried single-GPU training, the CUDA ran out of memory. I have 3 TESLA-M40 GPUs and each has 24G memory. Is there a possible solution? Thanks.
Please make sure to use small batch sizes (Flag: --batch-size 1
). You will run out of memory if you use larger batch sizes. 24GB should be more than enough for batch size 1.
Hi,
I am trying to do the single view training on multi-GPU but getting the error:
I am running the following command from xnocs/noxray/nxm directory: python nxm.py --mode train --input-dir shapenetplain_v1 --output-dir ../output --expt-name XNOCS_SV --category cars --arch SegNetSkip --seed 0 --gpu {1,2,3}
After searching above error on google, one solution was to change conv_block to conv_block.module in SegNet.py file. After doing that I am getting the following error:
Can you please let me know if I need to do anything else for multi GPU training. I am using Pytorch version 1.1.0.