ddlBoJack / emotion2vec

[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
637 stars 47 forks source link

fine-tuning pre train model #28

Open Moonmore opened 5 months ago

Moonmore commented 5 months ago

Hi, thank you very much for your work.

I want to continue to do some interesting work based on your work. I have not found any related model fine-tuning on modelscore and github. Can you please guide me on how to use your model for model fine-tuning and retraining?

many thanks

ddlBoJack commented 4 months ago

Hi, you can fine-tune the model with the FunASR auto model pipeline.

buanide commented 4 months ago

this is what you call fine-tuning ?


model = AutoModel(model="iic/emotion2vec_base_finetuned") # Alternative: iic/emotion2vec_plus_seed, iic/emotion2vec_plus_base, iic/emotion2vec_plus_large and iic/emotion2vec_base_finetuned

wav_file = f"{model.model_path}/example/test.wav" rec_result = model.generate(wav_file, output_dir="./outputs", granularity="utterance", extract_embedding=False)


?

Zevrap-81 commented 3 months ago

I want to load the model and finetune it myself for better accuracy on my voice and the voice of my colleagues.

I see that you provided the code for the upstream model and the model checkpoints i get from the emotion2vec_base.pt.

However, the checkpoint does not fully match the model. Specifically i get a RuntimeError

RuntimeError: Error(s) in loading state_dict for Data2VecMultiModel:
        Unexpected key(s) in state_dict: "modality_encoders.AUDIO.extra_tokens", "modality_encoders.AUDIO.alibi_scale". 

I think there are many such missmatches later as well once i resolve this. So to save myself some time, could you provide me the corresponding configs?

thanks in advance