int8-quantization Search Results

1000+ results
for int8-quantization

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

casper-hansen/AutoAWQ #45

INT8 quantization support

The motivation for INT8 is to keep even more accuracy while still getting some gains on inference speed. I experimented with implementing dequantization for INT8 and ultimately need more work on this …

casper-hansen updated 9 months ago
3
meituan/YOLOv6 #890

Full int8 quantization

### Before Asking - [X] I have read the [README](https://github.com/meituan/YOLOv6/blob/main/README.md) carefully. 我已经仔细阅读了README上的操作指引。 - [X] I want to train my custom dataset, and I have read the …

atalkingegg updated 10 months ago
2
pytorch-labs/gpt-fast #108

What happens to bias during int8 quantization?

I see that the linear layers weights are replaces with quantized weights. However, I don't see what happens to the bias in the linear layers? Is it not needed anymore? Why? I assume it should be …

gchhablani updated 3 months ago
3
google/praxis #68

Support for loading fp8 checkpoint

There is a use_fp flag for the offline_quantize tool in saxml/tool to quantize the weight in fp8 but still has to be stored in int8(https://github.com/google/praxis/blob/3f4cbb4bcda366db7b018695fbe2d4…

wenscarl updated 2 weeks ago
12
intel/neural-compressor #1817

how to extract int8 weights from quantized model

when loading the quantized model (smoothquant) with ``` from neural_compressor.utils.pytorch import load qmodel = load(qmodel_path, model_fp) ``` I got `RecursiveScriptModule(original_name=Qu…

chensterliu updated 3 days ago
8
bilibili/Index-1.9B #9

role play 下 VRAM使用不斷的增加

模型加載大概占用5G，來回的對話幾次後，就跳到6G，增加一次對話大概增加300MB記憶體，請問有辦法克服這個問題嗎? ============================== python realtime_chat.py --role_name 三三 -----PERFORM NORM HEAD user:你好 /home/allen/miniconda3/envs/index…

allencyhsu updated 1 week ago
2
Livox-SDK/livox_detection #44

support tensorrt int8 quantization?

Does this model support tensorrt int8 quantization? Anybody tried?

lygbuaa updated 11 months ago
1
NVIDIA/TensorRT #3776

Lower-than-Expected Performance Improvement with INT8 Quanti…

## Description I recently attempted to utilize INT8 quantization with Stable Diffusion XL to enhance inference performance based on the claims made in a recent [TensorRT blog post](https://developer.…

teith updated 1 month ago
15
Samsung/ONE #12165

[circle-quantizer] Support int8 quantization

### What Let's support int8 quantization in circle-quantizer. ### Why Onert-micro support int8 quantized kernels and contains faster CMSIS-NN kernel, which works with int8 quantization, not …

BalyshevArtem updated 6 months ago
11
huggingface/transformers #31474

Quantization support for heads and embeddings

### Feature request Hi! I’ve been researching LLM quantization recently ([this paper](https://arxiv.org/abs/2405.14852)), and noticed a potentially improtant issue that arises when using LLMs with 1-…

galqiwi updated 2 days ago
10

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for int8-quantization

1000+ results
for int8-quantization