-
We want to support the ability to run a full fine-tune with just 8 bit quantization.
-
It can train the ViT model from the Hugging Face transformer,
but when converting to tflite model it appear an error message that I can't solve it.
The following are the tinynn setting and the error…
-
**Describe the bug**
When I run `examples/quantization_2of4_sparse_w4a16/llama7b_sparse_w4a16.py`, I encounter the error "No modifier of type 'SparseGPTModifier' found". The version I used is 0.3.0. …
-
### Your current environment
I want to deply neuralmagic/DeepSeek-Coder-V2-Instruct-FP8 with 8 x NVIDIA L20,
use -tensor-parallel-size=8 --enforce-eager --trust-remote-code --quantization=fp8 --kv…
-
### 🚀 The feature, motivation and pitch
Currently qnn quantizer only supports PTQ (post training quantization), and we'd like to enable QAT (quantization aware trainning) for better quantization supp…
-
Hi,
Me and my friend have been reading the code for a while and we were looking for some ideas for contributing.
@ankane, you mentioned product quantization in #27. Is this still an issue? We would …
-
%%capture
!pip install unsloth "xformers==0.0.28.post2"
# Also get the latest nightly Unsloth!
!pip uninstall unsloth -y && pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://gi…
-
Hi, thank you for this work. How to quantize it to use int8? Any comments are appreciated.
-
Hi, thanks for your great work!
I have a small question about KV Cache quantization. Did you use pagedattention to accelerate KV Cache 4-bit quantization? If so, where is the corresponding cuda kerne…
-
Currently, Manticore uses the HNSW index over floats for its KNN search implementation. That might lead to excessive memory consumption, as all HNSW indexes must be loaded into RAM. One way to improve…