Open yangdongdong2000 opened 3 weeks ago
Besides, I am confuse about where the audio tokens encoded by whisper appear, it seems that in finutune.py there is nothing relative with whisper.
Please check readme.
prep_train download things you need, e.g., model weights etc.
The two "toy" script helps you seting things up before large scale training. We highly recommend to try these first, but you can skip.
Then you should run stage1->2->3 for stage 3 we recommend to run v2.
It is easier to say how it works in the inference code https://github.com/YuanGongND/ltu/blob/main/src/ltu_as/inference_gradio.py. In training, we extract whisper features and save on disk, and just load feature from the disk.
-Yuan
Thanks a lot!!!
I notice a lot of sh script in the directory train_scripts, I want to ask what's the difference between them.