Closed alamnasim closed 5 years ago
Hi,
Thanks for your prompt reply.
The main problem that I am facing is, training does not use GPU, while another project/code can use the same GPU. Do I need to change somewhere in the code?
Have you changed --gpu 2,3 option in the training command to make it aligned with the gpu set-up you have on your machine? I have just one gpu, so for me the proper set-up is --gpu 0, otherwise it cannot detect gpu and runs on cpu. --gpu sets the CUDA_VISIBLE_DEVICES variable - you can google this one to get more information on what you should specify there.
Have you changed --gpu 2,3 option in the training command to make it aligned with the gpu set-up you have on your machine? I have just one gpu, so for me the proper set-up is --gpu 0, otherwise it cannot detect gpu and runs on cpu. --gpu sets the CUDA_VISIBLE_DEVICES variable - you can google this one to get more information on what you should specify there.
I have 2 1060 GPU so i changed option to --gpu 1,2 and --gpu 0 and also --gpu 1, but in none of the case i found gpu detected. I also google regarding this, but problem not solved.
Have you changed --gpu 2,3 option in the training command to make it aligned with the gpu set-up you have on your machine? I have just one gpu, so for me the proper set-up is --gpu 0, otherwise it cannot detect gpu and runs on cpu. --gpu sets the CUDA_VISIBLE_DEVICES variable - you can google this one to get more information on what you should specify there.
I have 2 1060 GPU so i changed option to --gpu 1,2 and --gpu 0 and also --gpu 1, but in none of the case i found gpu detected. I also google regarding this, but problem not solved.
try maybe with --gpu 0,1 as far as I remember the number to use is the gpu number - starting from 0. However, --gpu 0 and --gpu 1 should work anyway in your case... I was trying this code with test mode only but gpu was working. Is nvidia-smi working? If not, try restarting the machine
maybe the slow is cause by the dataloader, as i tested librosa load wav and compute spect is very slow compare to scipy.io.wavefile and signal, or direct use tensorflow's stft. another points is can first select 3s wav then compute spect, not first use whole wav to compute spect then select 300 frame.
Hi, I want to check whether my training speed is reasonable?
117/7492 [..............................] - ETA: 8:24:28 - loss: 11.4793 - acc: 0.7942
My config is almost same as the training command in README except that 8 Titan 1080 ti GPUs are used, multiprocess is 32 and loss is amsoftmax (the optimal speed setting I try). To finish 128 epochs training on VoxCeleb2 needs 42 days (3 epochs per day). The time seems take too long. Does it run correctly? I check that the GPU-Util always stay 0% and it seems that the speed bottleneck is the data preprocessing.
Another question is that what acc will reach on val set of VoxCeleb2 when the model is converged?
P.S. Thanks for this nice work
Hi,
I don't think this speed is correct..... It's too slow, I would recommend you to edit the code slightly as @mlinxiang mentioned,
Sorry, I'm busy this week, but I can try to edit the code next week on improving the speed.
But I don't quite understand why this can be so slow on different machines, on my machine, each epoch takes about 2-3 hours.
The final training accuracy should be around 91-92%.
Again, this is very initial work, there's a lot room for improvements.
Best, Weidi
I am also facing the isse, as model trained very slowly. I run other codes and projects on same gpu and they are running fine, gpu has been used but VGG-Speaker runs slowly. I tried it on two NVIDIA GTX-1060 installed in my computer and P100 on google-cloud as well.
I tried everything to resolve this issue but not succeed.
Epoch 1/10 Learning rate for epoch 1 is 0.0001. 17/305810 [..............................] - ETA: 3354:00:12 - loss: 0.8716 - acc: 0.9531
Please help. Thanks.