Learning curves for Jasper

We are trying to reproduce some of your results on the newly available Russian speech-to-text dataset: https://github.com/snakers4/open_stt . The key questions are model capacity, model depth, compute requirements for training.

Could you please share the training learning curves for wav2letter++ and Jasper (5x3, 10x5) : loss, CER / WER)? It would be a great addition to the paper or to https://nvidia.github.io/OpenSeq2Seq/html/speech-recognition/jasper.html and make clearer the trade-offs Jasper makes.

Thank you very much!

NVIDIA / OpenSeq2Seq

Learning curves for Jasper #481