-
The motivation for INT8 is to keep even more accuracy while still getting some gains on inference speed. I experimented with implementing dequantization for INT8 and ultimately need more work on this …
-
### Before Asking
- [X] I have read the [README](https://github.com/meituan/YOLOv6/blob/main/README.md) carefully. 我已经仔细阅读了README上的操作指引。
- [X] I want to train my custom dataset, and I have read the …
-
I see that the linear layers weights are replaces with quantized weights.
However, I don't see what happens to the bias in the linear layers? Is it not needed anymore?
Why?
I assume it should be …
-
There is a use_fp flag for the offline_quantize tool in saxml/tool to quantize the weight in fp8 but still has to be stored in int8(https://github.com/google/praxis/blob/3f4cbb4bcda366db7b018695fbe2d4…
-
when loading the quantized model (smoothquant) with
```
from neural_compressor.utils.pytorch import load
qmodel = load(qmodel_path, model_fp)
```
I got
`RecursiveScriptModule(original_name=Qu…
-
模型加載大概占用5G,來回的對話幾次後,就跳到6G,增加一次對話大概增加300MB記憶體,請問有辦法克服這個問題嗎?
==============================
python realtime_chat.py --role_name 三三
-----PERFORM NORM HEAD
user:你好
/home/allen/miniconda3/envs/index…
-
Does this model support tensorrt int8 quantization? Anybody tried?
-
## Description
I recently attempted to utilize INT8 quantization with Stable Diffusion XL to enhance inference performance based on the claims made in a recent [TensorRT blog post](https://developer.…
teith updated
1 month ago
-
### What
Let's support int8 quantization in circle-quantizer.
### Why
Onert-micro support int8 quantized kernels and contains faster CMSIS-NN kernel, which works with int8 quantization, not …
-
### Feature request
Hi! I’ve been researching LLM quantization recently ([this paper](https://arxiv.org/abs/2405.14852)), and noticed a potentially improtant issue that arises when using LLMs with 1-…