Open TJKlein opened 1 year ago
Sure, we are going to release all training/inference codes for all results reported in paper.
@TJKlein Please the following multimodal inference code for image captioning. The code structure is very similar as ScienceQA.
https://huggingface.co/spaces/csuhan/LLaMA-Adapter/tree/main .
@TJKlein Please the following multimodal inference code for image captioning. The code structure is very similar as ScienceQA.
https://huggingface.co/spaces/csuhan/LLaMA-Adapter/tree/main .
Thanks for pointing at it. But I guess I would wait for the release of your scripts.
+1
Hi I am interested too in the integration of the visual adapter. When do you think the code will be released?
Thanks for your interest and waiting :). We are organizing the multi-modal code and will release it in one or two weeks.
Thank you for your quick response!
hi there! any updates on this? would love to take a crack at fine-tuning on a visual instruction dataset (not just image-caption as in your amazing work in v2!).
Hi, just wanted to check for any updates on this?
Hi, For reasons of reproducibility, it would be great if you provided source code to reproduce the results on ScienceQA. Thanks.