Open alx741 opened 5 years ago
Just curious: how does the performance (WER) differ between 44.1 kHz and 16 kHz?
@svenha It actually improved, it dropped from WER=~12%
(44.1khz) to WER=~8%
(16khz)
So, 16 kHz is better? This would fit with other reports.
So, 16 kHz is better? This would fit with other reports.
Yes, 16khz seems to be better
Apparently, when the model is trained using audio data with a sample rate other than 16kHz, the decoder fails at decoding audio at any sample rate, even when tweaking the corresponding sample rate parameters on the request to the server (or in the client arguments for that matter).
This was the issue I was having in #186: My model was originally trained with 44.1khz audio data (with a matching MFCC config
--sample-frequency=44100
of course). When I converted all my data to 16khz and re-trained the model, it worked perfectly.NOTE: This problem is likely to be on Kaldi's decoder rather than kaldi-gstream-server, but this is where I first encounter it so I'm putting it here to promote further investigation.