int8-quantization Search Results

1000+ results
for int8-quantization

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

mobiusml/hqq #86

Activation quantization

Can activation quantization also be introduced in Hqq as well? Or if not, is there any process/method can further quantize the activation after using Hqq to quantize the weight?

kaizizzzzzz updated 3 days ago
9
siliconflow/onediff #979

How to use int8 for onediff Controlnet

#### I am using Onediff - Controlnet, load model of float16. #### In your introduction, you used onediff int8, which is very effective in accelerating the model. I want to know if this is applicable…

cchen-reese updated 6 days ago
5
tsutof/tiny_yolov2_onnx_cam #1

INT8 Quantization

Hi, Thank you for the design code. I just want to know whether your design uses INT8 quantization and MAC operations or everything is happening in FP32. Thanks

Damodharan5 updated 4 years ago
1
tensorflow/tensorflow #69311

MobileNetV3 quantization

Hi! I'm trying to quantize MobileNetV3 with tflite, but int8-model performs very poor. I think, it is because of linear quantization, which is too simple method not appropriate for any weights distrib…

DaraOrange updated 2 weeks ago
5
michaelfeil/infinity #250

[Benchmark] - embedding quantization

### System Info 0.0.40 shipped the first version of embedding quant. `--embedding-dtype int8` This Issue is looking for testers, to verify the real life performance of these features at real da…

michaelfeil updated 3 weeks ago
1
pytorch/ao #430

Understanding 8da4w

Hi there, I'm new to quantization. From my understanding, "8da4w" means that the weights are pre-quantized to 4 bits, and the activations are quantized to 8 bits at runtime. Following this, the GEM…

DzAvril updated 5 days ago
2
pytorch-labs/gpt-fast #154

INT4 quantization not working on MI210

INT8 quantization works fine, but INT4 does not work. ![Capture](https://github.com/pytorch-labs/gpt-fast/assets/106262476/ac10df53-860e-4da9-b51e-1ad17e3fe3c4)

yafehlis updated 2 days ago
1
pytorch-labs/gpt-fast #137

CUDA error if enabling compile_prefill for quantization mode…

Repro command: ``` python generate.py --compile --compile_prefill --checkpoint_path checkpoints/$MODEL_REPO/model_int8.pth ``` Errors: ``` (pt) [ybliang@devgpu002.ash8 ~/local/gpt-fast (main)]…

yanboliang updated 1 week ago
7
NVIDIA/TensorRT-Model-Optimizer #30

Error when quantizing onnx model

Hi, This error occurred when I tried to quantize my onnx model. ``` Traceback (most recent call last): File "quant.py", line 4, in quantize( File "/usr/local/lib/python3.8/dist-packages…

de1star updated 5 days ago
1
NVIDIA/TensorRT-LLM #1833

Failed to run convert_checkpoint.py with int8 weight-only qu…

### System Info CPU Architecture: x86_64 CPU/Host memory size: 1024Gi (1.0Ti) GPU properties: GPU name: NVIDIA GeForce RTX 4090 GPU mem size: 24Gb…

frontword updated 1 hour ago
8

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for int8-quantization

1000+ results
for int8-quantization