Open dimuthuanuraj opened 9 months ago
i have encountered the same problem bro. Have you solved this issue? I nearly spent 2 hours for training one epoch on Voxceleb2 training set. And my gpu usage was utilized for one second while training single batch. I tried to debug to find out which it wastes too much time on reading/loading batch data. dont know if reading wav file stuck I/O channel or something bring the slow speed. Plus i use RTX3090 for training.
@mcflyhu Hi have now much improved the speed as follows,
Processing 14700 of 1076800:Loss 13.115565 TEER/TAcc 0.000% - 228.81 Hz
I made modifications to enable distributed training, and I noticed the previous port was used by some other process also. So I have changed the --port and --distributed', dest='distributed', action='store_false'
Note - I am using NVIDIA A10 GPU and T4 GPUS.
@dimuthuanuraj Thanks for your immediate reply, i will have a try on ditributed training as suggested setting to acclerate training process, really appreciate for your help.
I have experiencing several time speed is not going more than, 100Hz, I am using NvidiaA10 GPU with server. Could anyone please help me to figure this out?
Processing 1900 of 1076800:Loss 13.631720 TEER/TAcc 0.000% - 87.06 Hz