Closed inchpunch closed 5 years ago
Do you use NVIDIA container?
Yes, I used version 18.12
Can you attach a complete log file for Jasper, please?
I am generating it ... BTW, I changed the greedy decoder to be a beam search decoder but with beam size 1 (and without using LM). I think that is equivalent so would not change the speed. That is the only change in the code:
The code that I modified is in the fc_decoders.py (starting line 240):
else: def decode_without_lm(logits, decoder_input, merge_repeated=True): if logits.dtype.base_dtype != tf.float32: logits = tf.cast(logits, tf.float32)
# logits, decoder_input['encoder_output']['src_length'],
# merge_repeated,
# )
decoded, neg_sum_logits = tf.nn.ctc_beam_search_decoder(
logits, decoder_input['encoder_output']['src_length'],
self.params['beam_width'], 1, merge_repeated=False,
)
return decoded
and in configuration file, in base_params I set
"decoder_params": {
# params for decoding the sequence with language model
"beam_width": 1,
This is what was printed so far. It then waits a while to produce validation WER. I will add more output if needed. Thanks a lot.
Can you re-run with few changes please: 'num_gpus': 1, 'save_checkpoint_steps': 10000, 'eval_steps': 10000,
Sure, now I got: jasper_logs_1gpu.txt
Just found that our GPUs and programs/dataset are not in the same physical area such that data loading/access time is probably long due to the long distance connection. I will check again after moving my programs/dataset to the same location where we have GPUs.
I have moved my programs and data to the same location with GPUs. But the speed does not change much. Still around 1~2 second for "time per step" for Jasper 10x3. That translate to around 20 epochs per day.
For Jasper 5x3 (by keeping one block for each of B1 to B5), with same batch size per GPU and total 4 GPUs, time per step is about 0.8 sec.
I recall that for DS2 small model, it is said to train for 1 day with 1 GPU 12GB memory. That is for 12 epochs, and it only uses librivox-train-clean-100 and librivox-train-clean-360.
So it looks the speed on Jasper 10x3 is still expected?
I am using 4 GPUs (tesla v100-sxm2-32gb). Except changing the number of gpus, I just used the example configuration files, so all other parameters are the same as original. For ds2_large_mp, Jasper_10x3_mp, and w2l_plus_large_mp, the "time per step" are all around 1~2 second. Is that expected?