Open hacobe opened 2 years ago
I still don't know how to extract embeddings.
However, I can make model.predict work in the example above by doing the following:
1) Add these lines to the config.yaml:
hub:
input_type: fbank80_w_utt_cmvn
2) Replace "feat.unsqueeze(0)" with "np.expand_dims(feat, 0)" in speech_to_text/hub_interface.py (feat is a NumPy array not a Torch tensor)
3) Run model.predict(".../ARCTIC/cmu_us_aew_arctic/wav/arctic_a0001.wav")
❓ Questions and Help
What is your question?
How do I get the embeddings in each layer of a speech-to-text model for a given LibriSpeech input?
What have you tried?
I followed the instructions here (https://github.com/pytorch/fairseq/blob/main/examples/speech_to_text/docs/librispeech_example.md) to prepare the LibriSpeech dataset and train a speech-to-text model.
I then load the model:
At this point, I'm not sure what to do. As a preliminary step, I tried to figure out how to predict from the model following (https://github.com/pytorch/fairseq/issues/3069), but when I run predict on the first input of dev-clean.tsv:
I get a "ValueError: Unknown value: input_type = fbank80" error.
What's your environment?
pip
, source): pip