wav2letter inference - Githubissues

Ir0098 commented 3 years ago

I am trying to evaluate the quantized wav2letter model provided at:

It also includes an input sample and its corresponding output:

input: testing_input/input_2_int8/0.npy
output: testing_output/Identity_int8/0.npy

Also, it is suggested to decode the class probabilities (output) using algorithms like beam search and explicitly it is refereed to ctc_beam_search_decoder from tensorflow.

Accordingly, as the model is already quantized, the output tensor is dequantized using the the scales and _zeropoints.

scales: 0.00390625
zero_points: -128

Then, the dequntized output is generated as below:

deq_output = (output.astype(np.float32)-zero_points) * scales

And finally, it is passed to the decoder in order to create the sequence of letters:

# batch size: 1
# deq_output.shape after reshaping: (148, 1, 29) ==> (sequence length, batch size, # classes/characters)
# sequence_length : 148

(decoded, log_probabilities) = tf.nn.ctc_beam_search_decoder(inputs=deq_output,
                          sequence_length=[sequence_length],
                          beam_width=25,
                          top_paths=5)

tf versions: tested on both 2.4 and 2.5

However here is a sample decoded array:

(as a reminder, the alphabet is equal to "abcdefghijklmnopqrstuvwxyz' @")

 print("".join([alphabet[ch_ind] for ch_ind in np.array(decoded[0].values)])) :

 oe aie bm'abaqabaiebajakyiepaubacacua ara

P.S. the provided model is correct as the model prediction for the provided input matches to the provided output.

==================================================================================== And here are my questions:

1) is the decoding process correct from your side? 2) what's your idea about decoded output? 3) is the given input a fake generated tensor? 4) could you please share your validated decoding process? 5) could you please share your feature extraction process and its corresponding preprocessing steps (i.e., normalization, etc.), in case that a raw audio wave is being used for inference.

Thanks in advance,

MikeJKelly commented 3 years ago

Hi @Ir0098

I think the ML-Zoo team would be better able to answer these questions.

You can ask them at https://github.com/ARM-software/ML-zoo/issues

Best regards, Mike.

Ir0098 commented 3 years ago

Hello @MikeJKelly Thank you very much for your message. Going to raise there as well. With warmest regards, ir0098

AlexanderEfremovArm commented 3 years ago

Hi @Ir0098

Speech recognition sample might help you: python/pyarmnn/examples/speech_recognition. It has pre- and post-processing steps required to run wav2letter model correctly.

Best regards, Alex

MikeJKelly commented 3 years ago

Going to close this as I believe the questions have been answered on the ML-zoo github.

https://github.com/ARM-software/ML-zoo/issues/24

ARM-software / armnn

wav2letter inference #553