Closed riyijiye closed 5 years ago
We are not publishing training time benchmarks. Since training time numbers highly depend on numerous hardware and software related factors (GPU, RAM, I/O bandwidth; TensorFlow, Horovod, CUDA, OS, driver versions, etc.), it requires significant efforts to measure and report them in consistent manner. A better venue for such benchmarks might be MLPerf.org.
thanks!
I have trained it for a whole month and it hasn't still been completed... I change batch size from 32 to 8, and don't use horovod. I install OpenSeq2Seq following general installation instruction and train with 4 GeForce GTX 1080ti GPUs. Do you meet the same problem that the training time is too long?
How many epochs? What is a time per iteration? We used DGX1 with 8x V100 , and we trained with Horovod and mixed precision, which is ~ 2.5x faster than with float32.
I would recommend lowering num_epochs to 50; maybe even 40. I expect WER will decrease by 1-2%. I would also either remove speed perturbation or precompute and store the perturbed files as wav. 400 epochs on lower batch size, lower number of gpus and older gpu architecture will slow training time down by an order of magnitude.
My time per step is about 2.5 seconds and I think it's reasonable. Thanks for your explanations, and now I understand why it is low. I will change to GTX 2080 for faster training, because I want to reach the best performance of jasper in reasonable time. Another question: how can I build TensorFlow from sources with a custom CTC decoder operation when using NVIDIA TensorFlow Docker container?
If you are looking to build our current CTC decoder with language model, it should already be included in the nvidia tensorflow container.
We should have an update to the CTC decoder with language model soon that does not require rebuilding tensorflow.
@blisc hello, as I have run jasper with the same four machines as @germany-zhu ,and after 3 epochs wer is about 37% ,is it regular? I change nothing but the learning rate to 0.05. What's more, I didn't use docker, does it matter ,can docker speed up the training process or I should change faster machines to train jasper...
1) We used DGX1 server with 8 V100 to train large Jasper models and it takes around a week. On desktop with 4 cards this can take a lot of time. If you want to train a model for scratch, I would recommend to train first smaller model, for example with 5 blocks and 4 layers per block. 2) training with Horovod and docker is significantly faster
We used 8-v100 on GCP. Each epoch takes about 22 min with this default setting: https://github.com/NVIDIA/OpenSeq2Seq/blob/master/example_configs/speech2text/jasper10x5_LibriSpeech_nvgrad_masks.py
Which version of V-100 have you used? 16 GB or 32 GB? Have you used nvidia tensorFlow docker image, or it was "native" TensorFlow?
The 16 Gb version and latest Nvidia TensorFlow docker image.
@Shujian2015 we're getting OOM with this default config with Docker + V100 16GB. Horovod enabled.
Have you experienced OOMs?
@feroult, the default settings work well. I didn’t experience OOM issue. Although for Fisher-swbd, I have to reduce batch size to avoid oom.
for jasper10x5_LibriSpeech_nvgrad, I did noticed some training details are described in below ticket https://github.com/NVIDIA/OpenSeq2Seq/issues/415
Can anyone share the training time information as well (exactly following the jasper10x5_LibriSpeech_nvgrad.py example config)?
thanks!