int8-quantization Search Results

1000+ results
for int8-quantization

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

triton-inference-server/tensorrtllm_backend #462

How to deploy one model instance across multiple GPUs to tac…

I am trying to deploy a Baichuan2-7B model on a machine with 2 Tesla V100 GPUs. Unfortunately each V100 has only 16GB memory. I have applied INT8 weight-only quantization, so the size of the engine I…

shil3754 updated 1 week ago
6
NVIDIA/TensorRT-LLM #632

TensorRT-LLM Requests

Hi all, this issue will track the feature requests you've made to TensorRT-LLM & provide a place to see what TRT-LLM is currently working on. Last update: `Jan 14th, 2024` 🚀 = in development #…

ncomly-nvidia updated 6 days ago
8
tensorflow/tensorflow #62530

Internal quantize ops don't match external quantization

### 1. System information - Occurs in Google Colab w/ TF 2.14 - Have also verified w. TF 2.7 (Anaconda) on Windows 10 ### 2. Code [Colab to reproduce issue](https://colab.research.google.com…

EClemMarq updated 1 week ago
4
huggingface/transformers-bloom-inference #94

Are there fine-tuning and inference scripts available for in…

Where can I download bloom-7b? I noticed that int8 quantization is available, but is there an option for int4 quantization? What is the memory overhead for int4 and int8 when using LoRA or PTuning f…

dizhenx updated 1 year ago
1
TNTWEN/OpenVINO-YOLOV4 #46

How to use yolo_tiny to make INT8 Quantization?

I make some changes in yolov4_416x416_qtz.json and accuracy_checker\adapters\yolo.py as follows: "type": "yolo_v3", "anchors": "10.0, 14.0, 23.0, 27.0, 37.0, 58.0, 81.0, 82.0, 1…

su26225 updated 3 years ago
3
huggingface/lighteval #200

quantized model not loading

I aim to evaluate a 8-bit quantized model. for some reason lighteval asks me to provide data for quantization: ValueError: You need to pass `dataset` in order to quantize your model I started wi…

rankofootball updated 3 weeks ago
6
hunglc007/tensorflow-yolov4-tflite #53

Full Integer Quantization Not Working

I tried to get the full int8 quantization by running convert_tflite.py and setting the flag --quantize_mode full_int8. However, I got the following error: RuntimeError: Quantization not yet support…

sterlingrpi updated 3 years ago
35
google/XNNPACK #2762

Support int8 transposed convolutions with per-channel weight…

TFLite uses int8 per-channel weight quantization for transposed convolutions. While XNNPACK includes a fast transposed convolution operation it only supports per-tensor weight quantization (i.e. a si…

lgeiger updated 2 years ago
3
triton-inference-server/onnxruntime_backend #240

Will onxxruntime backend support INT8 on cpu ?

Hi, we are trying to quantise our onnx models to int8 to run on cpu using : https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html#quantization-on-gpu we are using dynamic …

bharadwajymg updated 4 months ago
1
Tencent/TNN #1150

int8 Quantization on SE module too much loss

1. 使用环境（environment） OS: Ubuntu OS Version: linux 2. Github版本 branch：master commit(optional): 6dc2d93 3. 详细描述bug 情况 (Describe the bug) I find out the problem when I use pretrained model…

BossunWang updated 2 years ago
6

上一页 1...12 13 14 15 16 17 18...100 下一页

1000+ results for int8-quantization

1000+ results
for int8-quantization