-
### 请提出你的问题
使用paddleslim 量化unimo时报错:Operator (fusion_unified_decoding) is not registered.
转为静态图后好像不支持’fusion_unified_decoding‘算子,有什么办法可以支持该算子(例如如何register)?
> Preparation stage, Run batch:| …
-
Hello,
It seems that currently int8 weight only and SmoothQuant quantizations are supported for GPT models, but no kind of quantization is supported for other autoregressive transformer models, suc…
-
I am trying to quantize and export to tensorrt engine a llama 3 finetuned [model ](https://huggingface.co/damerajee/Gaja-v1.00). But I am able to quantize the model but however I am unable to export t…
-
### Branch/Tag/Commit
main
### Docker Image Version
nvcr.io/nvidia/pytorch:22.09-py3
### GPU name
RTX3090, DGX1V
### CUDA Driver
470.182.03
### Reproduced Steps
#### Issue
…
-
-
within the docker (IMAGE: nvidia/cuda:12.1.0-devel-ubuntu22.04)
GPU: A100 40GB
TensorRT-LLM version: 0.10.0
flash-attn 2.5.9.post1
I quantize the phi3 model(phi-3-medium-128k-instrcut/), wi…
-
Are the code and parameters in this repository consistent with the parameters used in the experiments described in the paper? I conducted an experiment on an A100 using the provided command "bash 10_o…
-
![f654737ebc54932e591723efc3d1c02](https://user-images.githubusercontent.com/47971541/191495874-577ca7c6-9dc6-4d53-8ce3-8c6a1e3a4226.png)
-
See email from 7/9/2021 with the file name 5a1aH-chA_Na-PTQ_test.pdb attached.
-
Hi, Thanks for the repo you published on github, I tried to use the links [PTQ] and [√3-subdivision] and seems the links are broken. could you please fix this?
Best