Closed claudiogreco closed 4 years ago
Thanks for your question. I just test the command provided in the homepage once more and here is the log: The command runs successfully. It might take a few seconds to set up the multi-GPU envs. Please also help halve the batch_size since only 2 GPUs are used (but I think that it might not be the reason).
Could you also help provide the GPU/CUDA/NVCC versions if the problem is not solved?
Thanks for your answer. After having executed the command, I waited for about an hour, but I didn't see any progress bar. The server has three GPUs:
However, I had executed the command to use only GPUs 1 and 2. The version of CUDA is 8.0.
Thanks. May I ask the version of your PyTorch library? Since PyTorch > 1.0.0 is compiled with CUDA > 8.0 (on PyPI), thus I am wondering whether these two libraries are compatible.
I am using PyTorch 1.3.1. Good point. I will talk with the server administrators and we will see whether the problem will be solved by updating CUDA. Thanks.
P.S.: I noticed that the code works if I use only one GPU. I guess that as you also said it should be an issue with the drivers of my server. I will check this with administrators. Thanks for your help!
Hello,
I am trying to run the pre-training of the model again. When I run the command:
bash run/lxmert_pretrain.bash 1,2 --multiGPU --tiny
I get the following output:
and nothing else happens.
I guess I should see a progress bar or some intermediate information, right? Do you know how I could try to fix this issue?
Thanks, Claudio