[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
650
stars
48
forks
source link
KeyError: 'text' when inferring with iic/emotion2vec_plus_large model in FunASR #36
I encountered an issue while performing inference using the iic/emotion2vec_plus_large model with FunASR. Here's the traceback of the error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/lianqi/anaconda3/envs/funasr/lib/python3.11/site-packages/funasr/auto/auto_model.py", line 253, in generate
model = self.model if model is None else model
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/lianqi/anaconda3/envs/funasr/lib/python3.11/site-packages/funasr/auto/auto_model.py", line 471, in inference_with_vad
)
KeyError: 'text'
Code Used:
from funasr import AutoModel
import librosa
import soundfile as sf
model_emotion = AutoModel(model="iic/emotion2vec_plus_base", model_revision="master",
vad_model="fsmn-vad", vad_model_revision="v2.0.4",
max_single_segment_time=19000,
)
y, sr = librosa.load(wav_file)
y_16k = librosa.resample(y,orig_sr=sr,target_sr=16000)
sf.write("./temp.wav", y_16k, 16000, subtype='PCM_24')
res_emotion = model_emotion.generate("./temp.wav", output_dir="./outputs", granularity="utterance", extract_embedding=True)
print(res_emotion)
3. **Complete Console Information**:
model_emotion = AutoModel(model="iic/emotion2vec_plus_base", model_revision="master",
... vad_model="fsmn-vad", vad_model_revision="v2.0.4",
... max_single_segment_time=1000,
... )
Warning, miss key in ckpt: modality_encoders.AUDIO.decoder.blocks.0.0.weight, /home/lianqi/.cache/modelscope/hub/iic/emotion2vec_plus_base/model.pt
Warning, miss key in ckpt: modality_encoders.AUDIO.decoder.blocks.0.0.bias, /home/lianqi/.cache/modelscope/hub/iic/emotion2vec_plus_base/model.pt
Warning, miss key in ckpt: modality_encoders.AUDIO.decoder.blocks.1.0.weight, /home/lianqi/.cache/modelscope/hub/iic/emotion2vec_plus_base/model.pt
Warning, miss key in ckpt: modality_encoders.AUDIO.decoder.blocks.1.0.bias, /home/lianqi/.cache/modelscope/hub/iic/emotion2vec_plus_base/model.pt
Warning, miss key in ckpt: modality_encoders.AUDIO.decoder.blocks.2.0.weight, /home/lianqi/.cache/modelscope/hub/iic/emotion2vec_plus_base/model.pt
Warning, miss key in ckpt: modality_encoders.AUDIO.decoder.blocks.2.0.bias, /home/lianqi/.cache/modelscope/hub/iic/emotion2vec_plus_base/model.pt
Warning, miss key in ckpt: modality_encoders.AUDIO.decoder.blocks.3.0.weight, /home/lianqi/.cache/modelscope/hub/iic/emotion2vec_plus_base/model.pt
Warning, miss key in ckpt: modality_encoders.AUDIO.decoder.blocks.3.0.bias, /home/lianqi/.cache/modelscope/hub/iic/emotion2vec_plus_base/model.pt
Warning, miss key in ckpt: modality_encoders.AUDIO.decoder.proj.weight, /home/lianqi/.cache/modelscope/hub/iic/emotion2vec_plus_base/model.pt
Warning, miss key in ckpt: modality_encoders.AUDIO.decoder.proj.bias, /home/lianqi/.cache/modelscope/hub/iic/emotion2vec_plus_base/model.pt
2024-07-02 17:45:00,793 - modelscope - INFO - Use user-specified model revision: v2.0.4
I encountered an issue while performing inference using the iic/emotion2vec_plus_large model with FunASR. Here's the traceback of the error:
y, sr = librosa.load(wav_file) y_16k = librosa.resample(y,orig_sr=sr,target_sr=16000) sf.write("./temp.wav", y_16k, 16000, subtype='PCM_24') res_emotion = model_emotion.generate("./temp.wav", output_dir="./outputs", granularity="utterance", extract_embedding=True) print(res_emotion)
KeyError: 'text'