Open arch-btw opened 2 weeks ago
Here are some relevant links:
~Given it's a completely new architecture~, and a multimodal one at that, I imagine adding support for it will not be easy. But I'm also very excited to see this supported.
Edit: According to Meta researcher Armen Aghajanyan the architecture is actually similar:
Similar architecture to LLaMa (apart from QK-norm), get fast inference working.
Yes! Please do! Also because as of now there is no way to run it on CPU only.
yum yum yum :p
Since it uses similar to Medusa architecture is that likely to be supportable at the same time for the self-speculative decoding side of inference? It sounded like it could run without that, but it'd be neat to have that available too.
Let me know if we can answer any questions about the architecture, inference, etc. Our reference implementation in https://github.com/facebookresearch/chameleon should be clear. Differences from the Llama architecture are minor:
Considering the VQGAN is public, it should be possible for llama.cpp
to reinstate the image output capabilities.
+1000! I'd love to run Chameleon with llama.cpp!
Prerequisites
Feature Description
Motivation
This would be a great addition to llama.cpp!
The image features look interesting but it can also simply do Text -> Text and a lot of other combinations:
https://github.com/ggerganov/llama.cpp/assets/57669023/23e92f5a-e782-4bb7-ab66-c20fe113d514