X-LANCE / SLAM-LLM

Speech, Language, Audio, Music Processing with Large Language Model
MIT License
579 stars 52 forks source link

low training and val accuracy use wav2vec2 #131

Closed yuuuno32 closed 2 months ago

yuuuno32 commented 2 months ago

🚀 The feature, motivation and pitch

Hi there, I am trying to use a wav2vec encoder, which seems not supported by SLAM-LLM yet. but the training and test accuracy is very low for ASR task, only 30% training accuracy. The encoder uses last hidden layer embedding. Not sure if you have any idea or any plan of supporting this functionality. image

Alternatives

No response

Additional context

No response