ddlBoJack / emotion2vec

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
567 stars 42 forks source link

KeyError: 'text' when inferring with iic/emotion2vec_plus_large model in FunASR #36

Closed lianqi1998 closed 3 weeks ago

lianqi1998 commented 2 months ago
  1. Description:

I encountered an issue while performing inference using the iic/emotion2vec_plus_large model with FunASR. Here's the traceback of the error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/lianqi/anaconda3/envs/funasr/lib/python3.11/site-packages/funasr/auto/auto_model.py", line 253, in generate
    model = self.model if model is None else model
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/lianqi/anaconda3/envs/funasr/lib/python3.11/site-packages/funasr/auto/auto_model.py", line 471, in inference_with_vad
    )

KeyError: 'text'
  1. Code Used:
    
    from funasr import AutoModel
    import librosa
    import soundfile as sf
    model_emotion = AutoModel(model="iic/emotion2vec_plus_base", model_revision="master",
                          vad_model="fsmn-vad", vad_model_revision="v2.0.4",
                          max_single_segment_time=19000,
                          )

y, sr = librosa.load(wav_file) y_16k = librosa.resample(y,orig_sr=sr,target_sr=16000) sf.write("./temp.wav", y_16k, 16000, subtype='PCM_24') res_emotion = model_emotion.generate("./temp.wav", output_dir="./outputs", granularity="utterance", extract_embedding=True) print(res_emotion)


3. **Complete Console Information**:

model_emotion = AutoModel(model="iic/emotion2vec_plus_base", model_revision="master", ... vad_model="fsmn-vad", vad_model_revision="v2.0.4", ... max_single_segment_time=1000, ... ) Warning, miss key in ckpt: modality_encoders.AUDIO.decoder.blocks.0.0.weight, /home/lianqi/.cache/modelscope/hub/iic/emotion2vec_plus_base/model.pt Warning, miss key in ckpt: modality_encoders.AUDIO.decoder.blocks.0.0.bias, /home/lianqi/.cache/modelscope/hub/iic/emotion2vec_plus_base/model.pt Warning, miss key in ckpt: modality_encoders.AUDIO.decoder.blocks.1.0.weight, /home/lianqi/.cache/modelscope/hub/iic/emotion2vec_plus_base/model.pt Warning, miss key in ckpt: modality_encoders.AUDIO.decoder.blocks.1.0.bias, /home/lianqi/.cache/modelscope/hub/iic/emotion2vec_plus_base/model.pt Warning, miss key in ckpt: modality_encoders.AUDIO.decoder.blocks.2.0.weight, /home/lianqi/.cache/modelscope/hub/iic/emotion2vec_plus_base/model.pt Warning, miss key in ckpt: modality_encoders.AUDIO.decoder.blocks.2.0.bias, /home/lianqi/.cache/modelscope/hub/iic/emotion2vec_plus_base/model.pt Warning, miss key in ckpt: modality_encoders.AUDIO.decoder.blocks.3.0.weight, /home/lianqi/.cache/modelscope/hub/iic/emotion2vec_plus_base/model.pt Warning, miss key in ckpt: modality_encoders.AUDIO.decoder.blocks.3.0.bias, /home/lianqi/.cache/modelscope/hub/iic/emotion2vec_plus_base/model.pt Warning, miss key in ckpt: modality_encoders.AUDIO.decoder.proj.weight, /home/lianqi/.cache/modelscope/hub/iic/emotion2vec_plus_base/model.pt Warning, miss key in ckpt: modality_encoders.AUDIO.decoder.proj.bias, /home/lianqi/.cache/modelscope/hub/iic/emotion2vec_plus_base/model.pt 2024-07-02 17:45:00,793 - modelscope - INFO - Use user-specified model revision: v2.0.4

res_emotion = model_emotion.generate("./temp.wav", output_dir="./outputs", granularity="utterance", extract_embedding=True) rtf_avg: 2.022: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:34<00:00, 34.72s/it] rtf_avg: 2.878: 0%|▍ | 1/261 [00:01<06:36, 1.53s/it] rtf_avg: 1.028: 1%|▋ | 1/191 [00:01<03:44, 1.18s/it] rtf_avg: 0.613: 1%|▊ | 1/154 [00:00<02:28, 1.03it/s] rtf_avg: 0.423: 1%|▉ | 1/131 [00:00<01:47, 1.21it/s] rtf_avg: 0.317: 1%|█ | 1/113 [00:00<01:22, 1.37it/s] rtf_avg: 0.246: 1%|█▏ | 1/102 [00:00<01:06, 1.53it/s] rtf_avg: 0.209: 1%|█▎ | 1/94 [00:00<00:57, 1.61it/s] rtf_avg: 0.183: 1%|█▍ | 1/84 [00:00<00:49, 1.69it/s] rtf_avg: 0.159: 1%|█▋ | 1/75 [00:00<00:42, 1.75it/s] rtf_avg: 0.138: 1%|█▊ | 1/69 [00:00<00:37, 1.80it/s] rtf_avg: 0.115: 2%|██ | 1/62 [00:00<00:31, 1.96it/s] rtf_avg: 0.104: 2%|██▏ | 1/56 [00:00<00:28, 1.95it/s] rtf_avg: 0.090: 2%|██▍ | 1/51 [00:00<00:24, 2.04it/s] rtf_avg: 0.080: 2%|██▋ | 1/47 [00:00<00:21, 2.09it/s] rtf_avg: 0.075: 2%|██▊ | 1/44 [00:00<00:20, 2.05it/s] rtf_avg: 0.068: 2%|███▏ | 1/40 [00:00<00:18, 2.12it/s] rtf_avg: 0.063: 3%|███▍ | 1/36 [00:00<00:16, 2.10it/s] rtf_avg: 0.058: 3%|███▊ | 1/33 [00:00<00:15, 2.07it/s] rtf_avg: 0.050: 3%|████▎ | 1/29 [00:00<00:13, 2.12it/s] rtf_avg: 0.045: 4%|████▊ | 1/26 [00:00<00:11, 2.09it/s] rtf_avg: 0.040: 4%|█████▍ | 1/23 [00:00<00:10, 2.08it/s] rtf_avg: 0.036: 5%|██████▎ | 1/20 [00:00<00:09, 2.05it/s] rtf_avg: 0.034: 6%|███████▎ | 1/17 [00:00<00:08, 1.92it/s] rtf_avg: 0.031: 7%|████████▎ | 1/15 [00:00<00:07, 1.80it/s] rtf_avg: 0.025: 10%|████████████▌ | 1/10 [00:00<00:05, 1.80it/s] rtf_avg: 0.023: 12%|███████████████▊ | 1/8 [00:00<00:05, 1.32it/s] 0%| | 0/1 [01:13<?, ?it/s] Traceback (most recent call last):██▊ | 1/8 [00:00<00:05, 1.36it/s] File "", line 1, in File "/home/lianqi/anaconda3/envs/funasr/lib/python3.11/site-packages/funasr/auto/auto_model.py", line 253, in generate model = self.model if model is None else model ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/lianqi/anaconda3/envs/funasr/lib/python3.11/site-packages/funasr/auto/auto_model.py", line 471, in inference_with_vad )

KeyError: 'text'

ddlBoJack commented 1 month ago

Hi, have you installed FunASR correctly?

lianqi1998 commented 3 weeks ago

Hi, have you installed FunASR correctly?

The issue no longer occurred after I reinstalled it with conda. Thank you very much.