AetherCortex / Llama-X

Open Academic Research on Improving LLaMA to SOTA LLM
Apache License 2.0
1.59k stars 101 forks source link

improve LLaMA for visual understanding like GPT-4 #13

Closed feizc closed 1 year ago

feizc commented 1 year ago

Thanks for the good works!

We have tried to improve LLaMa model to understand visual information and support multi-modal chatting. We are inspired that a good vit, e.g., CLIP vision encoder, and a well-trained large language model, e.g., LLaMA, with connection network, e.g., MLP or Transformer, can cover visual applications, like PALM-E.

The results in image captioning, VQA, and more multi-modal tasks, are promising in 7B and we call on more people to support testing of larger models.

Github: https://github.com/feizc/Visual-LLaMA

AetherCortex commented 1 year ago

Hi feizc,

Thanks for your kindly reaching out the solid visual impact. We have understood your motivation and tried your code today, it's a good insight on improving the visual undersanding of LLaMA model, which is one of the most important capacity of SOTA and next generation LLMs. Thus, we formally invite you to participate in the research and development of the visual part of Llama-X and look forward to further cooperation in the future. If you are also interested in Llama-X and want to become a core contributor, please check the welcome email from "llama-x@mail.com" and reply to us with your convenient contact information, so that we can have in-depth communication.

 Thanks,  Llama-X