EricLBuehler / mistral.rs

Blazingly fast LLM inference.
MIT License
3.59k stars 255 forks source link

Support wgpu backend #464

Open richardanaya opened 3 months ago

richardanaya commented 3 months ago

It would be awesome to get broad level support for many graphics cards via wgpu implementation.

FL33TW00D commented 3 months ago

This may be in the works with https://github.com/huggingface/ratchet

More to come on this

EricLBuehler commented 3 months ago

Hi @FL33TW00D! I was wondering what your thoughts were on being able to integrate Ratchet as a backend - do you think the API is complete enough?

FL33TW00D commented 3 months ago

@EricLBuehler The API is complete enough to implement most current models, but perhaps waiting for 0.5.0 would be prudent.

EricLBuehler commented 3 months ago

Ok. Do you have a rough timeline on that, and do you anticipate any major API changes? I may start working on a backed agnostic framework for mistral.rs like Burn in the meantime.

FL33TW00D commented 3 months ago

Ok. Do you have a rough timeline on that, and do you anticipate any major API changes? I may start working on a backed agnostic framework for mistral.rs like Burn in the meantime.

Should be released next week. There may be API changes for the foreseeable future, it's not 1.0 yet.

Burn may be a good backend! No quant support though.

EricLBuehler commented 3 months ago

Should be released next week. There may be API changes for the foreseeable future, it's not 1.0 yet.

Exciting!

Burn may be a good backend! No quant support though.

Yeah, that was the only thing that stopped me from using it. What quants does Ratchet have?

FL33TW00D commented 3 months ago

Should be released next week. There may be API changes for the foreseeable future, it's not 1.0 yet.

Exciting!

Burn may be a good backend! No quant support though.

Yeah, that was the only thing that stopped me from using it. What quants does Ratchet have?

Ratchet supports GGUF quants, currently Q8_0 only, but Q4K in the works. We will support the most popular GGUF quants, and then expand to better quant schemes.

Quantization is completely transparent, no QMatmuls here.

EricLBuehler commented 3 months ago

We will support the most popular GGUF quants, and then expand to better quant schemes.

I'm adding GPTQ here: #467.

Quantization is completely transparent, no QMatmuls here.

How is it completely transparent, is it in the type? I haven't looked into the Ratchet codebase too much yet.

FL33TW00D commented 3 months ago

We will support the most popular GGUF quants, and then expand to better quant schemes.

I'm adding GPTQ here: #467.

Quantization is completely transparent, no QMatmuls here.

How is it completely transparent, is it in the type? I haven't looked into the Ratchet codebase too much yet.

Yup just in the type, makes more sense that way.