flashlight / wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit
https://github.com/facebookresearch/wav2letter/wiki
Other
6.37k stars 1.01k forks source link

--framestridems on training / decoding #961

Closed joeychsu closed 3 years ago

joeychsu commented 3 years ago

command “ wav2letter/build/Train train --help “ will get the description like -framestridems (Stride millisecond for power spectrum feature) default=10

As far as I know, kaldi traditional arch(GMM/DNN-HMM) will get one feature vector during 10 ms kaldi chain model will get one feature vector during 30 ms (--frame-subsampling-factor=3) In the chain model, training and decode time can be faster than traditional arch.

Have any recipe or experience results about -framestridems can share? Is it possible to use -framestridems to reduce the training / decode time? Many thanks!

tlikhomanenko commented 3 years ago

We didn't try much on this parameter, but so far 10 is good enough. We control stride and speed up computations with further stride inside the model itself, for example we apply convs with stride 2/3 and even maybe several convs with stride 2.

Yes, you can increase framestridems to speedup, but experiments on what accuracy you will have then and is it faster/better than doing stride in conv layers - is still a question. If you will have any results on this, feel free to share here with others!