ggerganov / llama.cpp

LLM inference in C/C++
MIT License
61.13k stars 8.72k forks source link

Feature Request: Support for Meta Chameleon 7B and 34B #7995

Open arch-btw opened 2 weeks ago

arch-btw commented 2 weeks ago

Prerequisites

Feature Description

"Meta Chameleon is a family of models that can combine text and images as input and output any combination of text and images with a single unified architecture for both encoding and decoding. While most current late-fusion models use diffusion-based learning, Meta Chameleon uses tokenization for text and images. This enables a more unified approach and makes the model easier to design, maintain, and scale. The possibilities are endless—imagine generating creative captions for images or using a mix of text prompts and images to create an entirely new scene."

Motivation

This would be a great addition to llama.cpp!

The image features look interesting but it can also simply do Text -> Text and a lot of other combinations:

https://github.com/ggerganov/llama.cpp/assets/57669023/23e92f5a-e782-4bb7-ab66-c20fe113d514

EliEron commented 2 weeks ago

Here are some relevant links:

~Given it's a completely new architecture~, and a multimodal one at that, I imagine adding support for it will not be easy. But I'm also very excited to see this supported.

Edit: According to Meta researcher Armen Aghajanyan the architecture is actually similar:

Similar architecture to LLaMa (apart from QK-norm), get fast inference working.

0wwafa commented 2 weeks ago

Yes! Please do! Also because as of now there is no way to run it on CPU only.

SolvAI commented 1 week ago

yum yum yum :p

ann-brown commented 1 week ago

Since it uses similar to Medusa architecture is that likely to be supportable at the same time for the self-speculative decoding side of inference? It sounded like it could run without that, but it'd be neat to have that available too.

jacobkahn commented 1 week ago

Let me know if we can answer any questions about the architecture, inference, etc. Our reference implementation in https://github.com/facebookresearch/chameleon should be clear. Differences from the Llama architecture are minor:

typedrat commented 1 week ago

Considering the VQGAN is public, it should be possible for llama.cpp to reinstate the image output capabilities.

chigkim commented 1 week ago

+1000! I'd love to run Chameleon with llama.cpp!