-
### Description
While running some benchmarking tests using [opensearch-benchmark](https://github.com/opensearch-project/opensearch-benchmark) on int8 scalar quantization using some of the standard…
-
By default quanto implements a simple absmax algorithm to evaluate the scale and zero-point to be used when quantizing QTensor and QBitsTensor. A refactoring is required in order to allow different al…
-
### What model would you like?
my Modelfile
FROM /home/house365ai/xxm/model/Qwen1.5-14B-Chat
ollama create Qwen1.5-14B-Chat -f Modelfile
how solve it?
-
Nice work in the paper. Besides:
1) Is there any analysis on the oscillation problem on the activation quantization? Since activation 2bit quantization is harder than weight quantization a lot, it i…
-
Based on recommendations from Testbed 13 Vector Tiles ER ( http://docs.opengeospatial.org/per/17-041.pdf ):
A global tiling grid combining the advantages of approximating equal-area while maintaini…
-
We need to add support for the quantized model in the VLLM project. We need this to run a llama quantized model via vllm. This involves implementing quantization techniques to optimize memory usage a…
-
I attempted to run `mistralrs-server` to serve my local copy of `dolphin-2.9-mixtral-8x22b.Q8_0.gguf`. This file isn't available on huggingface because it's broken into four parts [here](https://hugg…
-
# Prerequisites
- [Yes] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [Yes] I carefully followed the [README.md](https://github.com/ggerganov…
-
**问题描述:**
可以正常export模型并推理,但是量化的时候报错,应该是数据集的原因
**命令:**
CUDA_VISIBLE_DEVICES=0,1 swift export \
--ckpt_dir "/home/user/sdb1/sft-output/qwen1half-32b-chat/v4-20240510-064821/checkpoint-50/" \…
-
### Description
Having copy-on-write segments lends itself nicely with quantization. I propose we add a new "scalar" or "linear" quantization codec. This will be a simple quantization codec provided …