clovaai / voxceleb_trainer

In defence of metric learning for speaker recognition
MIT License
1k stars 272 forks source link

Why the speed is very slow? #180

Open dimuthuanuraj opened 4 months ago

dimuthuanuraj commented 4 months ago

I have experiencing several time speed is not going more than, 100Hz, I am using NvidiaA10 GPU with server. Could anyone please help me to figure this out?

Processing 1900 of 1076800:Loss 13.631720 TEER/TAcc 0.000% - 87.06 Hz

mcflyhu commented 3 months ago

i have encountered the same problem bro. Have you solved this issue? I nearly spent 2 hours for training one epoch on Voxceleb2 training set. And my gpu usage was utilized for one second while training single batch. I tried to debug to find out which it wastes too much time on reading/loading batch data. dont know if reading wav file stuck I/O channel or something bring the slow speed. Plus i use RTX3090 for training.

dimuthuanuraj commented 3 months ago

@mcflyhu Hi have now much improved the speed as follows,

Processing 14700 of 1076800:Loss 13.115565 TEER/TAcc 0.000% - 228.81 Hz

I made modifications to enable distributed training, and I noticed the previous port was used by some other process also. So I have changed the --port and --distributed', dest='distributed', action='store_false'

Note - I am using NVIDIA A10 GPU and T4 GPUS.

mcflyhu commented 3 months ago

@dimuthuanuraj Thanks for your immediate reply, i will have a try on ditributed training as suggested setting to acclerate training process, really appreciate for your help.