clovaai / voxceleb_trainer

In defence of metric learning for speaker recognition
MIT License
1.02k stars 272 forks source link

training is extremely slow (single gpu v-100, ssd) #125

Closed zabir-nabil closed 2 years ago

zabir-nabil commented 2 years ago

I have modified your code to work on a single gpu only. I'm trying to train ResNet34. The data is on SSD. I'm getting an unrealistically low training speed. My gpu utilization is flat 0.

+-------------------------------+----------------------+----------------------+
|   3  Tesla V100-SXM2...  Off  | 00000000:8A:00.0 Off |                    0 |
| N/A   33C    P0    56W / 300W |  24809MiB / 32480MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
Initialised Adam optimizer
Initialised step LR scheduler
Processing 33300 of 1085400:Loss 12.672576 TEER/TAcc 0.000% - 0.93 Hz

My config:

    ## GPU
    gpu = 3
    ## Data loader
    max_frames = 250
    eval_frames = 400
    batch_size = 900
    max_seg_per_spk = 500
    nDataLoaderThread = 0
    augment = False
    seed = 1997
    ## Training details
    test_interval = 5
    max_epoch = 500
    trainfunc = "aamsoftmax"
    ## Optimizer
    optimizer = "adam"
    scheduler =  "steplr"
    lr = 0.001
    lr_decay = 0.95
    weight_decay = 0.005
    ## Loss functions
    hard_prob = 0.5
    hard_rank = 10
    margin = 0.1
    scale = 30
    nPerSpeaker = 1
    nClasses = 7554
    ## Evaluation parameters
    dcf_p_target = 0.05
    dcf_c_miss = 1
    dcf_c_fa = 1
    ## Load and save
    initial_model = ""
    save_path = "exps/large_training_1"
    model_save_path = save_path+"/model"
    result_save_path = save_path+"/result"
    feat_save_path = ""
    ## Training and test data
    train_list = "data/large_training_1/train_large.txt"
    test_list = "data/large_training_1/veri_test.txt"
    train_path = ""
    test_path = "/AUDIO_DATA/voxceleb1/wav"
    musan_path = "data/musan_split"
    rir_path = "data/RIRS_NOISES/simulated_rirs"
    ## Model definition
    n_mels = 40
    log_input = True
    model = "ResNet34"
    encoder_type = "res34"
    nOut = 512
    ## For test only
    eval = False
    ## Distributed and mixed precision training
    mixedprec = True
    distributed = False
    port = "8888"

One issue I suspect is, I'm using librosa as soundfile can't read some of the file formats, and librosa usually tries soundfile first if that fails it tries audioread, could that be an issue?

zabir-nabil commented 2 years ago

I'm converting all of the files to .wav and let's see if the bottleneck goes away or not.

zabir-nabil commented 2 years ago

Okay, I solved it finally. It was not an issue from the librosa, but a bottleneck in my dataloader. The number of worker threads was set to 0 as I was working inside a docker container, increasing it drastically sped up my training.