YuanGongND / ltu

Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
389 stars 36 forks source link

Loading the CAV-MAE model #52

Open HuangZiliAndy opened 1 month ago

HuangZiliAndy commented 1 month ago

Hi Yuan,

Thank you for your excellent work! I’m currently trying to reproduce the results from scratch, but I ran into an issue when starting stage 1.

I couldn't find where you load the pretrained CAV-MAE model weights. I followed the script src/ltu/train_script/stage1_proj_cla.sh, which points to ../../../pretrained_mdls/vicuna_ltu/, but it seems this directory doesn’t contain the CAV-MAE weights.

Could you clarify where the weight loading for the audio encoder is done?

Thank you for your help!