-
within the docker (IMAGE: nvidia/cuda:12.1.0-devel-ubuntu22.04)
GPU: A100 40GB
TensorRT-LLM version: 0.10.0
flash-attn 2.5.9.post1
I quantize the phi3 model(phi-3-medium-128k-instrcut/), wi…
-
![f654737ebc54932e591723efc3d1c02](https://user-images.githubusercontent.com/47971541/191495874-577ca7c6-9dc6-4d53-8ce3-8c6a1e3a4226.png)
-
### 🐛 Describe the bug
- I'm reporting this issue due to errors related to capture_pre_autograd_graph and torch.compile in QAT.
- Note: Apologies if there are any misunderstandings.
- Based on th…
-
It can train the ViT model from the Hugging Face transformer,
but when converting to tflite model it appear an error message that I can't solve it.
The following are the tinynn setting and the error…
-
My use case:
Apply post training quantization to a pth model and convert to tflite. The generated tflite model fails to pass benchmark test with following error message:
STARTING!
Log parameter val…
-
Hi, Thanks for the repo you published on github, I tried to use the links [PTQ] and [√3-subdivision] and seems the links are broken. could you please fix this?
Best
-
### 请提出你的问题
使用paddleslim 量化unimo时报错:Operator (fusion_unified_decoding) is not registered.
转为静态图后好像不支持’fusion_unified_decoding‘算子,有什么办法可以支持该算子(例如如何register)?
> Preparation stage, Run batch:| …
-
I have used PTQ for int8 export from pytorch model and despite attempts at calibration, there is a significant drop in detection accuracy.
I am moving to quantization aware training to improve the…
-
Would love to use this as a PTQ layer with TensorRT. Are there any plans to support that in the future?
-
I am trying to quantize and export to tensorrt engine a llama 3 finetuned [model ](https://huggingface.co/damerajee/Gaja-v1.00). But I am able to quantize the model but however I am unable to export t…