model-quantization Search Results

1000+ results
for model-quantization

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Qcompiler/MIXQ #4

Question about the size of the quantized model

Thanks for the excellent work! I use examples/basic_quant_mix.py to quantize the Qwen2-7B model with --w_bit 8. It's very strange that the quantized model is even larger than the original model. …

ChuanhongLi updated 2 weeks ago
2
vllm-project/llm-compressor #865

Output of Compressor unable to be to be loaded by latest HF …

**Describe the bug** When using the preset W8A8 recipe from llm-compressor, the output results in a model config.json that fails validation when loaded by HF Transformers. This is a dev version of Tr…

hyaticua updated 1 month ago
2
alibaba/TinyNeuralNetwork #374

How to quantize ViT model with quantization aware training

It can train the ViT model from the Hugging Face transformer, but when converting to tflite model it appear an error message that I can't solve it. The following are the tinynn setting and the error…

Linsop2 updated 3 weeks ago
3
microsoft/VPTQ #126

How to Generate a 2-bit Quantized Meta-Llama-3.1-8B-Instruct…

I found a [similar closed issue](https://github.com/microsoft/VPTQ/issues/56) related to this topic. Following your reply in that issue, I successfully configured the `vptq-algo` environment based on …

ForAxel updated 3 days ago
3
mit-han-lab/deepcompressor #24

Question about quantize time for custom flux transformer

I'm currently using H800 to do Smooth Quantization for my custom flux transformer. I'm wondering how long it would take to finish quantization. I have been quantizing for 20 minutes, but the progress …

chuck-ma updated 2 hours ago
4
xorbitsai/inference #2583

xoscar.errors.ServerClosed: [address=0.0.0.0:13276, pid=39] …

### System Info / 系統信息 SERVER:Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz PRETTY_NAME:"Debian GNU/Linux 11 (bullseye)" python:3.11.5 conda:23.10.0 torch:2.4.1+cpu ### Running Xinference with D…

erliang-sf updated 9 hours ago
1
pytorch/ao #1335

Very large discrepancy in the quantized model's output compa…

Quantization on GPU works as expected with very small errors, but on CPU there seems to be a problem with the quantized model's output. Here is the code to replicate the problem. ```py import torc…

JohnnyRacer updated 2 days ago
4
pytorch/ao #1188

[FLOAT8] Add Hardware Compatibility Check for FP8 Quantizati…

### Add Hardware Compatibility Check for FP8 Quantization #### Issue Summary In our current implementation, we provide three APIs for model computation in FP8 format. However, for dynamic activati…

drisspg updated 6 days ago
1
tensorflow/tflite-micro #2735

quantization not working on newer versions of tflite

Hi, I ran the hello world example quantization script and it seems to increase the model size. This does not occue with pete wardens original notebook. He uses tensorflow 2.0.0. Using the 2.18.0 in…

SuranjanKTH updated 1 day ago
1
mit-han-lab/nunchaku #29

Full model always needed?

Is the full model needed before adding the quantization? It would be nice if it wasn't but maybe it's hard to avoid. At the moment the full model is downloaded when the pipeline is loading even tho…

samedii updated 16 hours ago
5

上一页 1...1 2 3 4 5 6 7...100 下一页

1000+ results for model-quantization

1000+ results
for model-quantization