int8-quantization Search Results

1000+ results
for int8-quantization

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

elastic/elasticsearch #109646

Add oversampling & rescoring options to `knn` query

### Description With int8 & int4 and any further quantization schemes we will provide, it is possible that to achieve adequate recall, some oversampling & rescoring with the raw float32 vectors might…

benwtrent updated 2 weeks ago
2
ultralytics/ultralytics #9473

Full INT8 TFLite Yolov8s losing a lot of performance compare…

### Search before asking - [X] I have searched the YOLOv8 [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussions) and f…

Hrayo712 updated 1 week ago
7
pytorch/ao #456

HF checkpoint integration story

Right now ao works just fine to quantize an arbitrary HF model However this simple workflow is failing meaning we don't really interop well with the rest of the HF ecosystem ```python from tr…

msaroufim updated 2 days ago
3
pytorch/ao #255

HQQ Tracker

- A16W4 axis=1 - Low hanging fruit we can add to int4wo quant as either a flag or replace the quant method - [x] test eval with HQQ axis=1 and compare to existing version - if axis…

HDCharles updated 1 week ago
1
UKPLab/sentence-transformers #2587

Implementing Embedding Quantization for Dynamic Serving Cont…

I'm currently exploring embedding quantization strategies to enhance storage and computation efficiency while maintaining high accuracy. Specifically, I'm looking at integrating these strategies with …

Nookbe updated 2 months ago
2
NVIDIA/TensorRT-Model-Optimizer #16

Error when Export TRT model from the Quantized ONNX

After successful quantizing and exporting ONNX models for ResNet18, using 2 different mode `int8` and `fp8`, I am trying to export these ONNX models to TRT, but no luck so far. It returns Error No sup…

chuong98 updated 1 month ago
10
microsoft/Llama-2-Onnx #16

Does llama2 support int8 quantization?

Use this script to build int8 but failed: https://github.com/microsoft/onnxruntime-inference-examples/tree/main/quantization/language_model/llama

shaonianyr updated 10 months ago
3
minhhotboy9x/ultralytics_YOLOv8_custom #8

How to use yolov8_QT/compare.py?

### Search before asking - [X] I have searched the YOLOv8 [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussions) and f…

StarryAzure updated 1 day ago
5
NVIDIA/TensorRT-Model-Optimizer #22

how to quantize onnx to fp8?

Hi again, I've successfully quantized an onnx model to int8, then converted to tensorrt engine and noticed the performance increase compared to fp16. ```bash python -m modelopt.onnx.quantizati…

yuvraj108c updated 6 days ago
4
NVIDIA/TensorRT-Model-Optimizer #14

Tried to apply PTQ to a basic CV CNN network and got slower …

I used mtq.INT8_default_CFG as recommended for CNN networks (mtq.quantize(model, config, forward_loop). My initial model ran at 80FPS after quantization it dropped to 40FPS? I checked the model struct…

tmagcaya updated 4 weeks ago
7

上一页 1...3 4 5 6 7 8 9...100 下一页

1000+ results for int8-quantization

1000+ results
for int8-quantization