smoothquant Search Results

329 results
for smoothquant

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

NVIDIA/TensorRT-LLM #643

AttributeError: 'NoneType' object has no attribute 'trt_tens…

I used the following steps to build SQ engine First, build docker image from main branch ``` git clone -b main https://github.com/triton-inference-server/tensorrtllm_backend.git # Update the su…

wjueyao updated 10 months ago
2
PaddlePaddle/PaddleNLP #7255

[Question]: 使用llm进行量化时，PaddleSlim 和 PaddlePaddle 版本的对应关系有吗

### 请提出你的问题 - https://github.com/PaddlePaddle/PaddleNLP/tree/develop/llm#6-%E9%87%8F%E5%8C%96 请问在使用llm进行量化时，文档中 PaddleSlim 和 PaddlePaddle develop版本，但是安装了PaddlePaddle之后，并没有paddle.fluid，但运行量化脚…

liguodongiot updated 10 months ago
6
NVIDIA/cutlass #1062

[BUG] An unknown error

**Describe the bug** I implemented SmoothQuant INT8 inference for PyTorch with `CUTLASS` INT8 GEMM kernels, which are wrapped as PyTorch modules in [torch-int](https://github.com/Guangxuan-Xiao/torch…

WelY1 updated 1 year ago
10
NVIDIA/TensorRT-LLM #333

Failed to build llam-2-7b-hf model

I came across this error when buiding llama-2-7b-hf after converting is to hf fast transformers format: ``` OSError: /llama/smooth_llama_7B/sq0.5/1-gpu does not appear to have a file named config.js…

zengqingfu1442 updated 11 months ago
3
casper-hansen/AutoAWQ #72

支持awq8bit量化吗？

支持awq8bit量化吗？

sunbeibei-hub updated 1 year ago
3
NVIDIA/TensorRT-LLM #1826

terminate called after throwing an instance of 'tensorrt_llm…

### System Info CentOS Linux release 7.9.2009 Nvida A40 * 4 llama-2-13b-hf TensorRT-LLM version: 0.11.0.dev2024061800 ### Who can help? _No response_ ### Information - [ ] The officia…

QLinfeng updated 3 months ago
10
amd/RyzenAI-SW #14

Slow performance using the opt-1.3b/opt-onnx example

Hello, after I couldn't use Ryzen AI on my Lenovo I got back to my Minisforum UM790 Pro where Ryzen AI is fortunately available on its 7940HS. Your new examples are a great starting point. I al…

Robbson updated 1 year ago
2
mit-han-lab/llm-awq #56

SmoothQuant vs AWQ which one is faster?

## Question We are very interested in two post-training quantization papers from han lab! SmoothQuant use W8A8 for efficient GPU computation. AWQ uses W4/3A16 for lower memory requirements and …

codertimo updated 1 year ago
2
hahnyuan/RPTQ4LLM #6

有配置perchannel的option选择吗？

如果做成perchannel效果怎么样，是不是就不需要reorder了？

wangshankun updated 1 year ago
6
ggerganov/llama.cpp #722

Rockchip RK3588 perf

Just did a very simple run with llama-7b-4bit. It... took a while. Had it run in a screen. But, it worked! ``` root@FriendlyWrt /s/o/llama.cpp (master)# time ./main --color -m models/ggml-model-q4…

IngwiePhoenix updated 1 week ago
99

上一页 1...27 28 29 30 31 32 33...33 下一页

329 results for smoothquant

329 results
for smoothquant