Closed Sreyan88 closed 5 months ago
hi there,
Thanks for reporting this.
Please temporally use the link in the main README file (https://github.com/YuanGongND/ltu#pretrained-models).
-Yuan
Also, if you are first time use this repo. I suggest to start from inference and use our recipe, it handles many things include downloading! In practice, this will make reproduction much easier.
Check this https://github.com/YuanGongND/ltu#option-3-local-inference.
Thank You @YuanGongND ! Awesome work again. For now, I have run prep_train.sh
and obtained ltu_ori_paper.bin
and other files. I have a few questions regarding the ckpts and would request you to please keep the issue open. The first 5 questions are:
Note: I am trying to run finetune_toy.sh
with my own instructions, and my aim is to enhance the models' capability for a certain type of task.
1) Is the ltu_ori_paper.bin
the final model after stage 4 training? If I want to fine-tune the model using any dataset is this the best choice?
2) If ltu_ori_paper.bin
is the final model after stage 4 training, what is the difference between this and the checkpoint-20000
available here
3) Why does not ltu_ori_paper.bin
have any other files but all other folders, including vicuna_ltu
and checkpoint-20000
have other files like schedulers and rng_states?
4) What exactly does this mean: Will load from ../../../pretrained_mdls/ltu_ori_paper.bin later, for implementation purpose, first load from ../../../pretrained_mdls/vicuna_ltu/
. Does this mean that ltu_ori_paper.bin
is not being loaded and vicuna_ltu
is being loaded?
5) What is the difference between vicuna_ltu
and ltu_ori_paper.bin
?
Thank You again!
I think the answer to 4 and 5 is that you use a trick to first load the original Vicuna model and later load your own model. You can correct me if I am wrong! Still investigating 1,2, and 3.
Is the ltu_ori_paper.bin the final model after stage 4 training? If I want to fine-tune the model using any dataset is this the best choice?
Yes, if your ft data is small. I suggest to read this https://github.com/YuanGongND/ltu?tab=readme-ov-file#finetune-the-ltultu-as-model-with-toy-data, we have a script to show the correct way of finetuning.
Model of different stages are also released.
If ltu_ori_paper.bin is the final model after stage 4 training, what is the difference between this and the checkpoint-20000 available here
They should be very similar, might come from different training instances (I retrained the model before releasing to test the code).
Why does not ltu_ori_paper.bin have any other files but all other folders, including vicuna_ltu and checkpoint-20000 have other files like schedulers and rng_states?
You only need the bin
file to do inference/finetuning. Other files in the dir are to help reproduction - people might interested in the actual scheduler/loss log/random_seed to debug their reproduction.
What exactly does this mean: Will load from ../../../pretrained_mdls/ltu_ori_paper.bin later, for implementation purpose, first load from ../../../pretrained_mdls/vicuna_ltu/. Does this mean that ltu_ori_paper.bin is not being loaded and vicuna_ltu is being loaded?
ltu_ori_paper.bin
basically is the audio encoder + proj + adapters, it relies on the original vicuna_ltu (LLM). Also, this trick loads the setting of LLM. I code this intentionally, changing it would cause some unexpected error.
What is the difference between vicuna_ltu and ltu_ori_paper.bin?
The first is a dir including settings of LLM and tokenizer and many other things. ltu_ori_paper.bin
are trained weights of audio encoder + proj + adapters. You would need both.
-Yuan
Again, I would suggest to first run inference and make sure everything is good even if you are only interested in FT. This could help find device/download and other issues.
btw, if the question is not directly related to the title, it would be nice to open another issue, which could help other people to search.
Thank You so much for your reply! Really appreciate it! I am closing the issue now!
Hello!
Great work! Looks like the Download buttons don't work for LTU (Original in Paper) Pre-trained Models.
Thank You and I look forward to the models!
Best, Sreyan