EricLBuehler / mistral.rs

Blazingly fast LLM inference.
MIT License
2.98k stars 215 forks source link

Model Wishlist #156

Open EricLBuehler opened 2 months ago

EricLBuehler commented 2 months ago

Please let us know what model architectures you would like to be added!

Up to date todo list below. Please feel free to contribute any model, a PR without device mapping, ISQ, etc. will still be merged!

Language models

Multimodal models

Embedding models

EricLBuehler commented 4 weeks ago

Ok, great.

chenwanqq commented 4 weeks ago

Ok, great.

You can check my #422 . I hope you don't mind me modify the API of Nonzero🙉

EricLBuehler commented 4 weeks ago

Not a problem 😄

wseaton commented 4 weeks ago

@NeroHin

IBM's Granite series Code Models. Granite Code Models

The 3b and 8b variants should already be supported as they are just based on the llama architecture.

The 20b and 34b variants are based on the GPTBigCode architecture which currently isn't implemented in mistral.rs.

The 3b and 8b variants do not work out of the box, they rely on tie word embeddings (which I was able to get working in mistral.rs), but the BPE tokenizer breaks because there are some tokens in the vocab list that are > 255 characters.

+1 to getting support for GPTBigCode and other starcoder model variants.

chenwanqq commented 3 weeks ago

@EricLBuehler I'm stil working on LLaVA. Meanwhile, with so much experience with rust and Candle, have you ever encountered any problem about memory usage? I have some kinds of confusion. https://github.com/huggingface/candle/issues/2273#issue-2360380212

EricLBuehler commented 3 weeks ago

@chenwanqq, that is great, let me know if I can help!

I replied to the discussion 2272. However, I discovered that the shadowing does mean that the big tensor will not get dropped! See this playground and my comment for more details.

I'll add a clippy lint here to avoid this on our end.

bachp commented 1 week ago

@EricLBuehler What is missing for GGUF quantized Qwen2?

EricLBuehler commented 1 week ago

Hi @bachp, that should be relatively easy to add, it would take inspiration from the other GGUF models such as quantized_phi3.rs. Do you think you would be able to add this?

EricLBuehler commented 1 week ago

We will be adding the Gemma 2 models shortly, see #486!

EricLBuehler commented 4 days ago

@francis2tm @chelbos @yongkangzhao we just merged LLaVA and LLaVA Next support. Kudos to @chenwanqq for their great work!

For vision models we now have:

csicar commented 4 days ago

I may be able to provide an implementation for whisper asr. If there is interest in that

sammcj commented 3 days ago

It doesn't look like it's been mentioned yet but DeepSeek Coder v2 (lite) support would be amazing given it's probably the best coding model out there.

EricLBuehler commented 2 days ago

@csicar that would be amazing!

EricLBuehler commented 2 days ago

It doesn't look like it's been mentioned yet but DeepSeek Coder v2 (lite) support would be amazing given it's probably the best coding model out there.

@sammcj that would be great, I can add that.