Open richardanaya opened 3 months ago
This may be in the works with https://github.com/huggingface/ratchet
More to come on this
Hi @FL33TW00D! I was wondering what your thoughts were on being able to integrate Ratchet as a backend - do you think the API is complete enough?
@EricLBuehler The API is complete enough to implement most current models, but perhaps waiting for 0.5.0
would be prudent.
Ok. Do you have a rough timeline on that, and do you anticipate any major API changes? I may start working on a backed agnostic framework for mistral.rs like Burn in the meantime.
Ok. Do you have a rough timeline on that, and do you anticipate any major API changes? I may start working on a backed agnostic framework for mistral.rs like Burn in the meantime.
Should be released next week. There may be API changes for the foreseeable future, it's not 1.0 yet.
Burn may be a good backend! No quant support though.
Should be released next week. There may be API changes for the foreseeable future, it's not 1.0 yet.
Exciting!
Burn may be a good backend! No quant support though.
Yeah, that was the only thing that stopped me from using it. What quants does Ratchet have?
Should be released next week. There may be API changes for the foreseeable future, it's not 1.0 yet.
Exciting!
Burn may be a good backend! No quant support though.
Yeah, that was the only thing that stopped me from using it. What quants does Ratchet have?
Ratchet supports GGUF quants, currently Q8_0
only, but Q4K
in the works.
We will support the most popular GGUF quants, and then expand to better quant schemes.
Quantization is completely transparent, no QMatmuls
here.
We will support the most popular GGUF quants, and then expand to better quant schemes.
I'm adding GPTQ here: #467.
Quantization is completely transparent, no QMatmuls here.
How is it completely transparent, is it in the type? I haven't looked into the Ratchet codebase too much yet.
We will support the most popular GGUF quants, and then expand to better quant schemes.
I'm adding GPTQ here: #467.
Quantization is completely transparent, no QMatmuls here.
How is it completely transparent, is it in the type? I haven't looked into the Ratchet codebase too much yet.
Yup just in the type, makes more sense that way.
It would be awesome to get broad level support for many graphics cards via
wgpu
implementation.