-
遇到如下报错,不知道为什么,dataloader拿出的数据应该是正确的
Mon Apr 08 21:36:18-INFO: Collect quantized variable names ...
Sampling stage, Run batch:| | 0/100
Traceback (most r…
-
### 🚀 Feature request
Quantization is a widely used technique to accelerate models, particularly when using the [torch.compile](https://pytorch.org/tutorials/intermediate/torch_compile_tutorial.htm…
-
Hey everyone,
I want to train the model on CPU to use the pytorch post dynamic quantization afterwards. and then run the eval.py on cuda to improve the inference.
I am having trouble running the m…
-
### System Info
CPU-X86
GPU-H100
Server XE9640
Code: TensorRT-LLM 0.8.0 release
### Who can help?
@Tracin @juney-nvidia
Regarding the [FP8 Post Quantization]((https://github.com/NVIDIA/Tenso…
-
I am wondering if PGB support Post-training quantization for instance what we we have for fasttext: https://flavioclesio.com/2019/03/22/post-training-quantization-in-fasttext-or-how-to-shrink-your-fas…
-
from the issue "https://developer.apple.com/forums/thread/740518 how do we use the computational power of A17 Pro Neural Engine?"
I learn that if i want to inference my mlmodel on my ipad pro with …
-
Firstly, thanks to all of you for the bravo project!
Currently, the model seems like does not support int8 quantization. Any plan on it?
-
The survey discusses the sensitivity of activation quantization and the tolerance of KV cache quantization in the context of post-training quantization (PTQ) for large language models (LLMs). It makes…
pprp updated
3 months ago
-
### Describe the issue
After quantization, the output ONNX model had faster inference speed and smaller model size, but why are the input and output tensors still float32?
I thought it should be u…
-
Hi,
I faced the issue when I tried to run **6.1 normal inference** and **6.2 inference with mixed precision** as your indications. But something was wrong:
**For 6.1 normal inference:**
(viditq…