int8-quantization Search Results

1000+ results
for int8-quantization

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Xilinx/Vitis-AI #1285

Vitis AI quantizer

I have a question. Can the Vitis AI quantizer be used with formats other than INT8 on the **ZCU104**? Also, after quantization, is the computation performed using INT8 or is it just stored as INT8? If…

ryan880718 updated 12 months ago
1
Dao-AILab/flash-attention #122

INT8 versions of FMHA and Flash-Attention (Forward)

Hi @tridao, we recently implemented INT8 forward FMHA (8-bit Flash-Attention) with both static and dynamic quantization for Softmax on our GPGPU card, and achieved good results and relatively okay acc…

jundaf2 updated 8 months ago
7
OpenNMT/CTranslate2 #1273

Quantization leads to performance degradation

When converting the model, I enable the quantization to 'int8', but I noticed a decrease in performance of the converted model by 5 points in terms of BLEU. Therefore, I would like to inquire if the…

baoguo1995 updated 1 year ago
2
microsoft/onnxruntime #21138

Quantized ONNX Model Still Has Float32 Input/Output Tensors

### Describe the issue After quantization, the output ONNX model had faster inference speed and smaller model size, but why are the input and output tensors still float32? I thought it should be u…

jenchun-potentialmotors updated 1 month ago
4
elastic/elasticsearch #109646

Add oversampling & rescoring options to `knn` query

### Description With int8 & int4 and any further quantization schemes we will provide, it is possible that to achieve adequate recall, some oversampling & rescoring with the raw float32 vectors might…

benwtrent updated 2 months ago
3
microsoft/onnxruntime #18576

# Issue with Rounding Behavior in onnxruntime's Quantizeline…

### Describe the issue It appears that when processing an independent Quantizelinear layer in onnxruntime, the rounding behavior is consistently rounding to the lower integer instead of the expected …

SStarver updated 1 month ago
2
NVIDIA/TensorRT-Model-Optimizer #72

Quant Flux-dev OOM on L20

How many GPU memory will be used to quant flux-dev ? Can be offload to cpu when not enough GPU memory ? The following part of your input was truncated because CLIP can only handle sequences up to 77…

hezeli123 updated 1 week ago
4
microsoft/nni #5801

Does NNI ModelSpeedupTensorRT support Encoder-Decoder models…

**Question**: I have an encoder decoder model, quantized using TensorRT's packages for post-training quantization. It is in the HuggingFace transformers saved model format. The model is a TrOCR model…

donjuanpond updated 1 month ago
1
tensorflow/tensorflow #60884

Model containing LSTM does not run after conversion using AC…

### System information Linux OpenSuse Tumbleweed - TensorFlow installation : pip - TensorFlow library : Tf-nightly, occurs on earlier versions too ### Code Converting a model containing an …

DerryFitz updated 3 months ago
9
pytorch/ao #663

high throughput inference

Was chatting with @Chillee about our plans in AO today and he mentioned we should be focusing on a few concrete problems like 1. Demonstrate compelling perf for fp8 gemm at a variety of batch sizes. …

msaroufim updated 1 month ago
3

上一页 1...15 16 17 18 19 20 21...100 下一页

1000+ results for int8-quantization

1000+ results
for int8-quantization