YuanGongND / ltu

Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
367 stars 33 forks source link

Question about LTU-AS base model #18

Open dingdongwang opened 8 months ago

dingdongwang commented 8 months ago

Hi, I have a question about the base model for ft and training stage 1. Since I saw the base model for FT is ltuas_long_noqa_a6.bin, which is only 187MB, and the base model for training Stage 1 is in vicuna_ltuas/ , which is only 154MB. However, for llama-7b should be more than 9GB, why are the base models in this codebase so small?

Besides, may I ask what is the difference and relationship between pretrained_mdls/vicuna/ and pretrained_mdls/vicuna_ltuas/? (The size for the 2 folders are 13G, 200M respectively, the pretrained_mdls/vicuna/ seems to include LLM, but the model of this file does not seem to have been put into any training phase)

Thank you and looking forward to your reply!

YuanGongND commented 8 months ago

hi there, thanks for the questions. And I appreciate that you put these in different issues so it is easier to search by others.

Hi, I have a question about the base model for ft and training stage 1. Since I saw the base model for FT is ltuas_long_noqa_a6.bin, which is only 187MB

This is just as expected. For both LTU and LTU-AS, we use lora adapters, which means we only train a small (<5%) additional parameters and keep the LLM weights unchanged. And we also do not save weights of LLM as it does not change. So the 187M consists of 1/ audio encoder (part of CAV-MAE model), 2/ audio encode to LLM projection layer, i.e., a 1024 * 4096 linear layer to match dimension; and 3/ lora adapters.

and the base model for training Stage 1 is in vicuna_ltuas/ , which is only 154MB. However, for llama-7b should be more than 9GB, why are the base models in this codebase so small?

vicuna_ltuas/ is suppose to have LLM (the large model, not only 9GB), however, to save space, we use a soft link ln -s to point it to vicuna_ltu. Go to vicuna_ltuas/, do ls -la, you should see the pointer.

Besides, may I ask what is the difference and relationship between pretrained_mdls/vicuna/ and pretrained_mdls/vicuna_ltuas/? (The size for the 2 folders are 13G, 200M respectively, the pretrained_mdls/vicuna/ seems to include LLM, but the model of this file does not seem to have been put into any training phase)

This is related to the second question, the reason you see LLM in the first but not the second is that - we use a soft link in the vicuna_ltuas dir since LLMs for LTU and LTU-AS are same. Other things are different, please to not mix use them.

-Yuan