YuanGongND / ltu

Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
389 stars 36 forks source link

How to replace the audio encoder in the model? #50

Closed zhangron013 closed 2 months ago

zhangron013 commented 2 months ago

Thank you for sharing an exciting work, admittedly, your work is excellent, and I would like to use your code to evaluate the performance of the encoder by changing it to a different audio encoder. But I'm a little confused, you don't seem to provide the source code for the model, and I'm not familiar with the Transformers library, so I don't have a clue how to replace the original encoder. Could you explain how to replace your audio encoder?

您好,非常感谢您分享的工作,我想在您工作的基础上替换不同的音频编码器用以分析编码器对音频大模型内容理解能力的影响,但是我对Transformers库并不熟悉,而且您好像没有开放模型架构部分的代码,想请教您如何能替换掉原有的编码器呢?

YuanGongND commented 2 months ago

hi there,

all code are released.

R the audio encoder, for LTU, it is https://github.com/YuanGongND/ltu/blob/2002aad8305ee5579a2237a85a6e792c1174cda7/src/ltu/hf-dev/transformers-main/src/transformers/models/llama/modeling_llama.py#L669

for LTU-AS, we save the encoder feature on disk and load it in training.

Please check the https://github.com/YuanGongND/ltu/tree/main#important-code for the pointers for important codes, sorry some files are not in obvious place.

-Yuan

zhangron013 commented 2 months ago

thanks a lot !!