quantization Search Results

1000+ results
for quantization

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

bytedance/ABQ-LLM #13

CUDA kernel of weight only quantization

As mentioned in README, [Note that due to the limitations of AutoGPTQ kernels, the real quantization of weight-only quantization can only lead memory reduction, but with slower inference speed.] I'm …

Sekri0 updated 1 month ago
1
Josh-XT/AGiXT #1285

Load GitHub Repo into Long Term memory thread crashes

### Description when attempting to load a GitHub Repo into long term memory after it reading and saving to collections , it doesn't get all the files but somewhere it crashes. Logs ``` b" Runni…

birdup000 updated 1 day ago
1
gregsimon/luma-tools #5

Feature: change select quantization

Request: Maybe have a way to select 2048 or 4096 samples length for making open hihats?

gregsimon updated 3 months ago
1
vllm-project/llm-compressor #852

Why is the speed does not increase after compressed it?

https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct https://github.com/vllm-project/llm-compressor/tree/main/examples/quantization_w8a8_fp8 https://github.com/vllm-project/llm-compressor/tre…

liho00 updated 1 month ago
8
vllm-project/llm-compressor #848

[Question]Does Minicpmv2.6 currently support int8/fp8 quanti…

Does Minicpmv2.6 currently support int8/fp8 quantization? thanks~

wjj19950828 updated 1 day ago
3
NVIDIA/TensorRT-LLM #2445

Build Qwen2-72B-Instruct model by INT4-AWQ quantization fail…

### System Info Ubuntu 20.04 NVIDIA A100 nvcr.io/nvidia/tritonserver:24.10-trtllm-python-py3 and 24.07 TensorRT-LLM v0.14.0 and v0.11.0 ### Who can help? @Tracin ### Information - [x] The offici…

wangpeilin updated 1 week ago
1
Xilinx/Vitis-AI #1447

YOLOv5 quantization

Hi everyone, I'm trying to quantize the YOLOv5n model from [here](https://github.com/ultralytics/yolov5). I'm using the Vitis-AI v3.0 docker with the following code: ``` import pytorch_nndct i…

60rw311 updated 4 months ago
1
pytorch/executorch #6846

How to Apply Different Quantization Settings Per Layer in Ex…

Dear @kimishpatel @jerryzh168 @shewu-quic I want to split a model(eg, Llama-3.2-3B) into multiple layers and apply different quantization settings(qnn_8a8w, qnn_16a4w...) to each layer. Has such…

crinex updated 1 week ago
2
NVIDIA/TensorRT-LLM #2392

Qwen2-72B w4a8 empty output

### System Info GPU: 4090 Tensorrt: 10.3 tensorrt-llm: 0.13.0.dev2024081300 ### Who can help? @Tracin May you please have a look, thank you very much ### Information - [ ] The official example sc…

lishicheng1996 updated 2 weeks ago
4
NVIDIA/TensorRT-LLM #1083

TensorRT Quantization Breaks for `LlamaLinearScalingRotaryEm…

### System Info NVIDIA 4090 TensorRT-0.7.1 In nvidia-ammo, it appears these lines in `ammo/torch/export/layer_utils.py` have an unexpected failure for some Llama variants: In particular, the…

Sanger2000 updated 1 week ago
9

上一页 1...10 11 12 13 14 15 16...100 下一页

1000+ results for quantization

1000+ results
for quantization