int8-quantization Search Results

1000+ results
for int8-quantization

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

microsoft/DeepSpeed #4023

[REQUEST] How to use int8 quantization inference without tra…

Hi, I read the docs about `zero_quant`, but it seems to require extra training. And in `deepspeed.init_inference`, the `dtype` can be set to int8, but the code does nothing for int8. https://github…

KimmiShi updated 9 months ago
4
ultralytics/ultralytics #15837

It is hoped that the exported model can infer two pictures a…

### Search before asking - [X] I have searched the Ultralytics YOLO [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussion…

liangjiegao updated 1 month ago
1
quic/ai-hub-models #95

[BUG] IOT BYOM Issue: Compiling PyTorch model to a QNN Conte…

**Describe the issue** Please provide details relating to the issue you're hitting, if it is related to performance, accuracy or other model issues with bringing your own model to Qualcomm AI Hub, to…

Midi12 updated 6 days ago
7
huggingface/transformers #31474

Quantization support for heads and embeddings

### Feature request Hi! I’ve been researching LLM quantization recently ([this paper](https://arxiv.org/abs/2405.14852)), and noticed a potentially improtant issue that arises when using LLMs with 1-…

galqiwi updated 3 weeks ago
14
microsoft/BitBLAS #160

TensorIntrin 'mma_i8i8f16_smooth_a_trans_b_smooth_b' is not …

I want to use INT8 matmul , and the code/output is as follows: ### Code ``` import bitblas import torch bitblas.set_log_level("Debug") matmul_config = bitblas.MatmulConfig( M=16, # M dime…

huanpengchu updated 1 month ago
5
NVIDIA/TensorRT-LLM #455

The benchmark test failed in the main branch(#422).

As is mentioned in this [issue](https://github.com/NVIDIA/TensorRT-LLM/issues/110) that the release branch does not support the bfloat16+weight_only_int8 quantization, while this feature is already su…

Kelang-Tian updated 1 month ago
5
huggingface/transformers #16195

Gpt2 large for onnx exportation and int8 quantization

Hi, for model big as 7GB, does transformers support export to onnx?? Any tutorial about big model?

lucasjinreal updated 2 years ago
4
intel/intel-extension-for-pytorch #686

How to save and load ipex optimized model?

### Describe the issue Hi IPEX team, I have an application where I want to serve multiple models concurrently, and I want to share weights across concurrent instances. I normally do this with `tor…

benja-matic updated 3 weeks ago
12
zerollzeng/tiny-tensorrt #69

Why does Int8 quantization occupy more GPU graphics memory t…

please descript your problem in **English** if possible. it will to helpful to more people **Describe the bug** A clear and concise description of what the bug is. **To Reproduce** Steps to repr…

nameli0722 updated 1 year ago
8
tensorflow/tensorflow #62664

TFLite conversion (w/ int8 quantization) from ConcreteFuncti…

### 1. System information - OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Win 10 22H2 (but reproducible elsewhere) - TensorFlow installation (pip package or built from source): pip pack…

DLumi updated 8 months ago
12

上一页 1...10 11 12 13 14 15 16...100 下一页

1000+ results for int8-quantization

1000+ results
for int8-quantization