Closed hengyuan-zhang-0 closed 1 hour ago
question_id: 2,3,4 corresponding images:
the fine-tuned checkpoint might suck in some specific subsets because it is co-trained in a very diverse action-vision vqa set, you can further tune it to adapt it to your target dataset.
I merged the Lora checkpoints provided here and followed the inference guide, but the results I obtained are not ideal, as shown in the image below. @Dantong88 Can you provide some help?