model-quantization Search Results

1000+ results
for model-quantization

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

vllm-project/vllm #7107

[Bug]: ValueError: The number of required GPUs exceeds the t…

### Your current environment [root@localhost wangjianqiang]# python -m vllm.entrypoints.openai.api_server --model /root/wangjianqiang/deepseek-moe/deepseek-coder-33b-base/ --tensor-parallel-size 8 …

WangJianQ-0118 updated 3 weeks ago
2
quic/aimet #2930

result from aimet evaluation and result after quantization o…

Hi: I tried QAT on a model and exported the encodings. Then, I used the qnn-onnx-converter with --quantization_overrides and --input_list trying to put min/max/scale value after QAT into the converte…

superpigforever updated 6 months ago
2
pytorch/glow #3570

how to perform quantization of my onnx or pytorch model.

now I have my own pytorch and onnx model. how can I quantize it using glow in python API, and then how can I inference it in glow? is there any clear doc? thanks.

WilliamZhaoz updated 4 years ago
6
NVIDIA/TensorRT-LLM #1247

v0.8.0 KeyError: 'builder_config' when benchmarking with n…

### System Info - CPU：4090 * 4 - TensorRT-LLm : v0.8.0 - CUDA Version: 12.3 - NVIDIA-SMI 545.29.06 ### Who can help? _No response_ ### Information - [X] The official example scripts …

plt12138 updated 3 days ago
14
microsoft/onnxruntime #19409

onnxruntime 1.17.0: transformers benchmarking failing for in…

### Describe the issue Onnxruntime transformers benchmarking is failing for int8 quantized inference. the same is working fine with onnxruntime 1.16.3. I added the error details below. I found the b…

snadampal updated 9 months ago
4
AutoGPTQ/AutoGPTQ #95

Error when quantizing GPT2-XL

When running `examples/quantization/basic_usage_gpt_xl.py` an error occurs during the model packing: ``` 2023-05-22 04:08:34 INFO [auto_gptq.quantization.gptq] duration: 0.16880011558532715 2023-…

lksj92hs updated 1 year ago
1
intel-analytics/ipex-llm #11058

Unable to save quantized model

I m trying to save a int4 quantized model. When i try to save it , i get this error when trying to solve the issue. Traceback (most recent call last): File "C:\Users\AI-Perf\Varsha\ipex-llm\pytho…

vmadananth updated 6 months ago
1
dusty-nv/jetson-containers #563

Unable to run Llamaspeak on Jetson Orin NX 16GB

Hi, I'm trying to run Llamaspeak following the Instructions on https://www.jetson-ai-lab.com/tutorial_llamaspeak.html Specs: Jetson Orin NX(16GB) Developer Kit Jetpack 6.0 [L4T 36.3.0] The RI…

JQZhai updated 5 months ago
2
LostRuins/koboldcpp #1146

No i8mm/sve in dimensity8300 means slower generation.

Only q4_0_4_4 gguf are running in my Poco X6 pro phone. CPU-Z said it have cortex A510 and A715 cores. They are support both i8mm and sve. When i tried to run a gguf what needs it this happens: ~/…

Hiso89 updated 1 month ago
4
mit-han-lab/smoothquant #42

Accuracy drop for Llama

I tried to quantize a Llama model (Llama 13b) by smooth quant, and found that if I only quantize `LlamaDecoderLayer` then the accuracy would not drop even directly quantize weights and activations, bu…

fmo-mt updated 1 month ago
10

上一页 1...91 92 93 94 95 96 97...100 下一页

1000+ results for model-quantization

1000+ results
for model-quantization