YuanGongND / ltu

Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
337 stars 27 forks source link

Missing Checkpoints #14

Closed Sreyan88 closed 5 months ago

Sreyan88 commented 5 months ago

Hello!

Great work! Looks like the Download buttons don't work for LTU (Original in Paper) Pre-trained Models.

Thank You and I look forward to the models!

Best, Sreyan

YuanGongND commented 5 months ago

hi there,

Thanks for reporting this.

Please temporally use the link in the main README file (https://github.com/YuanGongND/ltu#pretrained-models).

-Yuan

YuanGongND commented 5 months ago

Also, if you are first time use this repo. I suggest to start from inference and use our recipe, it handles many things include downloading! In practice, this will make reproduction much easier.

Check this https://github.com/YuanGongND/ltu#option-3-local-inference.

Sreyan88 commented 5 months ago

Thank You @YuanGongND ! Awesome work again. For now, I have run prep_train.sh and obtained ltu_ori_paper.bin and other files. I have a few questions regarding the ckpts and would request you to please keep the issue open. The first 5 questions are:

Note: I am trying to run finetune_toy.sh with my own instructions, and my aim is to enhance the models' capability for a certain type of task.

1) Is the ltu_ori_paper.bin the final model after stage 4 training? If I want to fine-tune the model using any dataset is this the best choice?

2) If ltu_ori_paper.bin is the final model after stage 4 training, what is the difference between this and the checkpoint-20000 available here

3) Why does not ltu_ori_paper.bin have any other files but all other folders, including vicuna_ltu and checkpoint-20000 have other files like schedulers and rng_states?

4) What exactly does this mean: Will load from ../../../pretrained_mdls/ltu_ori_paper.bin later, for implementation purpose, first load from ../../../pretrained_mdls/vicuna_ltu/. Does this mean that ltu_ori_paper.bin is not being loaded and vicuna_ltu is being loaded?

5) What is the difference between vicuna_ltu and ltu_ori_paper.bin?

Thank You again!

Sreyan88 commented 5 months ago

I think the answer to 4 and 5 is that you use a trick to first load the original Vicuna model and later load your own model. You can correct me if I am wrong! Still investigating 1,2, and 3.

YuanGongND commented 5 months ago

Is the ltu_ori_paper.bin the final model after stage 4 training? If I want to fine-tune the model using any dataset is this the best choice?

Yes, if your ft data is small. I suggest to read this https://github.com/YuanGongND/ltu?tab=readme-ov-file#finetune-the-ltultu-as-model-with-toy-data, we have a script to show the correct way of finetuning.

Model of different stages are also released.

If ltu_ori_paper.bin is the final model after stage 4 training, what is the difference between this and the checkpoint-20000 available here

They should be very similar, might come from different training instances (I retrained the model before releasing to test the code).

Why does not ltu_ori_paper.bin have any other files but all other folders, including vicuna_ltu and checkpoint-20000 have other files like schedulers and rng_states?

You only need the bin file to do inference/finetuning. Other files in the dir are to help reproduction - people might interested in the actual scheduler/loss log/random_seed to debug their reproduction.

What exactly does this mean: Will load from ../../../pretrained_mdls/ltu_ori_paper.bin later, for implementation purpose, first load from ../../../pretrained_mdls/vicuna_ltu/. Does this mean that ltu_ori_paper.bin is not being loaded and vicuna_ltu is being loaded?

ltu_ori_paper.bin basically is the audio encoder + proj + adapters, it relies on the original vicuna_ltu (LLM). Also, this trick loads the setting of LLM. I code this intentionally, changing it would cause some unexpected error.

What is the difference between vicuna_ltu and ltu_ori_paper.bin?

The first is a dir including settings of LLM and tokenizer and many other things. ltu_ori_paper.bin are trained weights of audio encoder + proj + adapters. You would need both.

-Yuan

YuanGongND commented 5 months ago

Again, I would suggest to first run inference and make sure everything is good even if you are only interested in FT. This could help find device/download and other issues.

YuanGongND commented 5 months ago

btw, if the question is not directly related to the title, it would be nice to open another issue, which could help other people to search.

Sreyan88 commented 5 months ago

Thank You so much for your reply! Really appreciate it! I am closing the issue now!