RicherMans / Dasheng

Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"
Apache License 2.0
40 stars 3 forks source link

Fine-tune on a Downstream Task with an LM Head #6

Closed SoshyHayami closed 1 month ago

SoshyHayami commented 1 month ago

Thank you for training and providing this encoder.

I understand you're busy but I was wondering if you have the time to showcase (similar to your esc50 fine-tuning example) on how to attach a pre-trained auto-regressive LLM (such as Qwen2) as a decoder LM head to make this into a full-fledged audio-llm.

It will be very much appreciated and will make it a lot easier to build things with your work for those of us with subpar technical knowledge.

RicherMans commented 1 month ago

Hey there @SoshyHayami, thanks for the interest! So I never did that myself to attach an LLM head onto dasheng, thus I don't know how to do that either. I am quite busy with other stuff recently, but if you have the time, I appreciate those kinds of recipes.

I might still add this request myself, but due to the current time schedule, at least in Q4 2024.