Closed remixer-dec closed 1 year ago
LLaMa Adapter V2 support chat and visual instruction following separately. We will release the MM-LLM (llama-adapter V1) and chat model recently. The release of visual instruction model (LLaMa Adapter V2) still takes several weeks. Stay tuned.
Demo and pretrained checkpoint of LLaMa Adapter V2 (visual instruction part) will be released in a few days. Sorry for the long waiting.
Please check out demo page http://llama-adapter.opengvlab.com/
Very impressive! Good job.
@gaopengpjlab do you still plan to release training code for the multimodal model (not just demo and pretrained weights)?
@jpgard Sure. All training code will be released. The pretrained weights/inference code will be released within 2-3 days. Full pretraining/finetuning of multimodal code will be released within 10 days. Sorry for the long waiting.
Got it @gaopengpjlab ! Thanks for your contributions and for making it open-source, looking forward to it!
Pretrained weights have been released. https://github.com/ZrrSkywalker/LLaMA-Adapter/tree/main/llama_adapter_v2_multimodal
Full finetuning code coming soon.
It works! Thank you for the code! P.S. I added M1 Mac Acceleration for multi-modal version in a separate branch of my llama-mps fork
https://github.com/ZrrSkywalker/LLaMA-Adapter/tree/main/imagebind_LLM
pretraining/finetuning/inference code has been released. We support image/video/text/autio/point cloud input and bilingual(chinese/english) response.
Sorry for the long waiting. Hope you enjoy our code.
Greetings! I noticed that your README has a demo image of Visual Instruction model, but I wasn't able to find relevant code for it. Is it already supported in v2 or is it planned for v3? Will it work with 7b model?