ggerganov / llama.cpp

LLM inference in C/C++
MIT License
64.94k stars 9.31k forks source link

Feature Request: Add support for Phi-3.5 MoE and Vision Instruct #9119

Open YorkieDev opened 3 weeks ago

YorkieDev commented 3 weeks ago

Prerequisites

Feature Description

Microsoft has recently dropped two new models in the Phi Family.

3.5 MoE: https://huggingface.co/microsoft/Phi-3.5-MoE-instruct 3.5 Vision: https://huggingface.co/microsoft/Phi-3.5-vision-instruct

It would be nice to see support added to llama.cpp for these two models.

Motivation

Supporting all model releases so the wider community can enjoy these great free models.

Possible Implementation

No response

curvedinf commented 3 weeks ago

MoE looks promising. Any word on how complex it is to add support for?

JackCloudman commented 3 weeks ago

Is someone working on it? :pray:

simsi-andy commented 3 weeks ago

Especially vision would be worth it. But I lack the knowledge to do smth. like this.

mounta11n commented 3 weeks ago

Yes, the vision model is surprisingly good. As a gguf format under llama.cpp, this would open up undreamt-of possibilities

Bildschirmfoto_20240821_213502

foldl commented 2 weeks ago

ChatLLM.cpp supports Phi-3.5 MoE model now.

For developers: MoE Sparse MLP is ~the same as~ a little different from the one used in Mixtral.

ayttop commented 2 weeks ago

https://github.com/foldl/chatllm.cpp

| Supported Models | Download Quantized Models |

What's New:

2024-08-28: Phi-3.5 Mini & MoE

Inference of a bunch of models from less than 1B to more than 300B, for real-time chatting with RAG on your computer (CPU), pure C++ implementation based on @ggerganov's ggml.

| Supported Models | Download Quantized Models |

What's New:

2024-08-28: Phi-3.5 Mini & MoE

ayttop commented 2 weeks ago

https://huggingface.co/microsoft/Phi-3.5-MoE-instruct/discussions/4

microsoft/Phi-3.5-MoE-instruct convert to gguf gguf

Dampfinchen commented 2 weeks ago

Pretty sad to see no support for Phi 3.5 MoE in llama.cpp. Sure, it might have dry writing and is very censored, but in assistant tasks it's much better than all the smaller models combined. It truly has 70B quality in just 6.6B active parameters so its much easier to run than even G2 27B (which it beats according to benchmarks).

sourceholder commented 2 weeks ago

@Dampfinchen, have you found any way to run Phi 3.5 MoE locally? I'm open to try out alternatives to llama.cpp.

arnesund commented 1 week ago

Also eager to get Phi 3.5-Vision support. Most accurate photo and screenshot descriptions I've seen so far.

EricLBuehler commented 1 week ago

@Dampfinchen @sourceholder @arnesund if you are interested in running Phi 3.5 MoE or Phi 3.5 vision with alternatives to llama.cpp, perhaps you could check out mistral.rs.

Just a quick description:

We have support for Phi 3.5 MoE (docs & example: https://github.com/EricLBuehler/mistral.rs/blob/master/docs/PHI3.5MOE.md) and Phi 3.5 vision (docs & examples: https://github.com/EricLBuehler/mistral.rs/blob/master/docs/PHI3V.md).

All models can be run with CUDA, Metal, or CPU SIMD acceleration. We have Flash Attention and PagedAttention support for increased inference performance, and support in-situ quantization in GGUF and HQQ formats.

If you are using the OpenAI API, you can use the provided OpenAI-compatible (superset, we have things like min-p, DRY, etc) HTTP server. There is also a Python package. For Phi 3.5 MoE and other text models, there is also an interactive chat mode.

Dampfinchen commented 1 week ago

Thank you, but I and many others rather wait for official support.

I wonder what's the holdup? Shouldn't it be possible to copy a lot of the code from Mixtral to Phi 3.5 MoE given they have a pretty similar architecture with two experts?

Thellton commented 1 week ago

Thank you, but I and many others rather wait for official support.

I wonder what's the holdup? Shouldn't it be possible to copy a lot of the code from Mixtral to Phi 3.5 MoE given they have a pretty similar architecture with two experts?

no one's taken the task up yet sadly. there's presently work being done on Phi-3.5 Vision Instruct though which is something to look forward to considering the reported vision understanding that the model has.

ayttop commented 1 week ago

phi-3.5-moe-instruct gguf lamacpp???????????????????????

bunnyfu commented 3 days ago

Bumping up thread. :)