Jungjee / RawNet

Official repository for RawNet, RawNet2, and RawNet3
MIT License
357 stars 55 forks source link

Too long IO time #24

Closed KojimaIsBad closed 2 years ago

KojimaIsBad commented 2 years ago

I tried to train RawNet2 on VoxCeleb2 dataset with default settings. But I found that one epoch on average takes about 2.5 hours. By observing GPU activity, I found that most of the time GPU is waiting for data IO. To reduce GPU waiting time, I tried to increase args "num_workers" and "prefetch_factor" in PyTorch DataLoader,but the data loading time just did not decrease. My hardware: 3*RTX3090,128G Memory,HDD disk.

  1. I wonder if you used a SSD when training the network? And how long one epoch takes when you train RawNet2?
  2. Do you have some advice on decreasing IO time? Thanks.
Jungjee commented 2 years ago

Hi, KojimalsBad

yes. I don't recall precisely, however, it should take no more than an hour If your RAM supports, the best would be to move the data to your RAM to avoid reading from a HDD. One question is, did you not have other issues when training other models? I think this issue would also happen in other cases as well.

KojimaIsBad commented 2 years ago

Thanks. I copied VoxCeleb1 to RAM and trained RawNet2 on it, then the training time dropped a lot. However, I cannot move the entire VoxCeleb2 to my RAM, maybe I need a new SSD.