According to the readme of espnet_model_zoo, the user has to use the getitem first.
I don't know, how to fix that. Could you fix the example on huggingface?
Here, the examples from huggingface and github with the mismatch of the expected output of Speech2Text:
Hi, I tried a model from Huggingface (https://huggingface.co/espnet/simpleoier_librispeech_asr_train_asr_conformer7_wavlm_large_raw_en_bpe5000_sp) and copied the code from the "Use in ESPnet" button. The example was broken, I had to change
to
According to the readme of espnet_model_zoo, the user has to use the getitem first. I don't know, how to fix that. Could you fix the example on huggingface?
Here, the examples from huggingface and github with the mismatch of the expected output of
Speech2Text
: