OpenGVLab / LLaMA-Adapter

[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
GNU General Public License v3.0
5.74k stars 374 forks source link

Visual Instruction model #24

Closed remixer-dec closed 1 year ago

remixer-dec commented 1 year ago

Greetings! I noticed that your README has a demo image of Visual Instruction model, but I wasn't able to find relevant code for it. Is it already supported in v2 or is it planned for v3? Will it work with 7b model?

gaopengpjlab commented 1 year ago

LLaMa Adapter V2 support chat and visual instruction following separately. We will release the MM-LLM (llama-adapter V1) and chat model recently. The release of visual instruction model (LLaMa Adapter V2) still takes several weeks. Stay tuned.

gaopengpjlab commented 1 year ago

Demo and pretrained checkpoint of LLaMa Adapter V2 (visual instruction part) will be released in a few days. Sorry for the long waiting.

gaopengpjlab commented 1 year ago

Please check out demo page http://llama-adapter.opengvlab.com/

remixer-dec commented 1 year ago

Very impressive! Good job.

jpgard commented 1 year ago

@gaopengpjlab do you still plan to release training code for the multimodal model (not just demo and pretrained weights)?

gaopengpjlab commented 1 year ago

@jpgard Sure. All training code will be released. The pretrained weights/inference code will be released within 2-3 days. Full pretraining/finetuning of multimodal code will be released within 10 days. Sorry for the long waiting.

jpgard commented 1 year ago

Got it @gaopengpjlab ! Thanks for your contributions and for making it open-source, looking forward to it!

gaopengpjlab commented 1 year ago

Pretrained weights have been released. https://github.com/ZrrSkywalker/LLaMA-Adapter/tree/main/llama_adapter_v2_multimodal

Full finetuning code coming soon.

remixer-dec commented 1 year ago

It works! Thank you for the code! P.S. I added M1 Mac Acceleration for multi-modal version in a separate branch of my llama-mps fork

gaopengpjlab commented 1 year ago

https://github.com/ZrrSkywalker/LLaMA-Adapter/tree/main/imagebind_LLM

pretraining/finetuning/inference code has been released. We support image/video/text/autio/point cloud input and bilingual(chinese/english) response.

Sorry for the long waiting. Hope you enjoy our code.