Closed thisurawz1 closed 2 weeks ago
Can we do only text, image-text, and video-text finetuning with Lora in one run? I mean, put only text, text-image, and video-image samples in the same custom.json file and do the fine-tuning?
Yes, you can. The LazysupervisedDataset in train.py unifies the processing of pure text, image-text, video-text data sample.
LazysupervisedDataset
train.py
pure text
image-text
video-text
Can we do only text, image-text, and video-text finetuning with Lora in one run? I mean, put only text, text-image, and video-image samples in the same custom.json file and do the fine-tuning?