Closed Ir0098 closed 3 years ago
Hi @Ir0098
I think the ML-Zoo team would be better able to answer these questions.
You can ask them at https://github.com/ARM-software/ML-zoo/issues
Best regards, Mike.
Hello @MikeJKelly Thank you very much for your message. Going to raise there as well. With warmest regards, ir0098
Hi @Ir0098
Speech recognition sample might help you: python/pyarmnn/examples/speech_recognition. It has pre- and post-processing steps required to run wav2letter model correctly.
Best regards, Alex
Going to close this as I believe the questions have been answered on the ML-zoo github.
I am trying to evaluate the quantized wav2letter model provided at:
project link
It also includes an input sample and its corresponding output:
Also, it is suggested to decode the class probabilities (output) using algorithms like beam search and explicitly it is refereed to ctc_beam_search_decoder from tensorflow.
Accordingly, as the model is already quantized, the output tensor is dequantized using the the scales and _zeropoints.
Then, the dequntized output is generated as below:
And finally, it is passed to the decoder in order to create the sequence of letters:
tf versions: tested on both 2.4 and 2.5
However here is a sample decoded array:
(as a reminder, the alphabet is equal to "abcdefghijklmnopqrstuvwxyz' @")
P.S. the provided model is correct as the model prediction for the provided input matches to the provided output.
==================================================================================== And here are my questions:
1) is the decoding process correct from your side? 2) what's your idea about decoded output? 3) is the given input a fake generated tensor? 4) could you please share your validated decoding process? 5) could you please share your feature extraction process and its corresponding preprocessing steps (i.e., normalization, etc.), in case that a raw audio wave is being used for inference.
Thanks in advance,