the number of MLS model parameters and polish dev loss curve fluctuations

Question

I have read the paper and have some questions. In thisr paper MLS: A LARGE-SCALE MULTILINGUAL DATASET FOR SPEECH RESEARCH, a 36-layers transformer was used to train the monolingual model. I would like to know the model size. A 1GB acoustic model is provided in the mls folder, but I want to know the number of parameters of the model. Besides, when reproducing the monolingual results in this paper for Polish, the dev loss always fluctuate seriously, but this did not happen in Portuguese and Italian. Even after adjusting the learning rate, it will still fluctuate. When I shuffle the order of train and dev and redistribute the two datasets, the loss of dev can converge well. How can I check the problem?

flashlight / wav2letter

the number of MLS model parameters and polish dev loss curve fluctuations #1022

Question