-
We plan to add QAT for LLMs to torchao (as mentioned in the original RFC here https://github.com/pytorch-labs/ao/issues/47)
For this to run efficiently on the GPU we'd need kernel support for W4A8…
-
### 💡 Your Question
Hi,
I am just checking, I see in the provided results that Yolo-NAS-L does not suffer much reduction in performance going to Yolo-NAS-INT8-L. Can I check what exactly is meant …
lpkoh updated
10 months ago
-
### System Info
- GPU: 2xA100-40G
- TensorRT-LLM v0.8.0
### Who can help?
@Tracin
### Information
- [X] The official example scripts
- [ ] My own modified scripts
### Tasks
- [ ] An officia…
-
I have a question. Can the Vitis AI quantizer be used with formats other than INT8 on the **ZCU104**? Also, after quantization, is the computation performed using INT8 or is it just stored as INT8? If…
-
Hello,
I measured the time between BitBlas matmul and normal torch.matmul in your QuickStart code, but there appears to be no speedup. Am I missing something?
```
import bitblas
import torch
…
-
- Running with the following is not working properly
- Running with the recipe is working properly
```python
from datasets import load_dataset
from transformers import AutoTokenizer
from llmc…
-
Does TensorRT support QAT&PTQ INT8 quantization of clip/vit models? Could you please provide any relevant quantization examples and accuracy & latency benchmark?
shhn1 updated
8 months ago
-
### 1. System information
- Occurs in Google Colab w/ TF 2.14
- Have also verified w. TF 2.7 (Anaconda) on Windows 10
### 2. Code
[Colab to reproduce issue](https://colab.research.google.com…
-
Hi all, this issue will track the feature requests you've made to TensorRT-LLM & provide a place to see what TRT-LLM is currently working on.
Last update: `Jan 14th, 2024`
🚀 = in development
#…
-
Hi,
we are trying to quantise our onnx models to int8 to run on cpu using : https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html#quantization-on-gpu
we are using dynamic …