-
I am trying to deploy a Baichuan2-7B model on a machine with 2 Tesla V100 GPUs. Unfortunately each V100 has only 16GB memory.
I have applied INT8 weight-only quantization, so the size of the engine I…
-
Hi all, this issue will track the feature requests you've made to TensorRT-LLM & provide a place to see what TRT-LLM is currently working on.
Last update: `Jan 14th, 2024`
🚀 = in development
#…
-
### 1. System information
- Occurs in Google Colab w/ TF 2.14
- Have also verified w. TF 2.7 (Anaconda) on Windows 10
### 2. Code
[Colab to reproduce issue](https://colab.research.google.com…
-
Where can I download bloom-7b?
I noticed that int8 quantization is available, but is there an option for int4 quantization?
What is the memory overhead for int4 and int8 when using LoRA or PTuning f…
-
I make some changes in yolov4_416x416_qtz.json and accuracy_checker\adapters\yolo.py as follows:
"type": "yolo_v3",
"anchors": "10.0, 14.0, 23.0, 27.0, 37.0, 58.0, 81.0, 82.0, 1…
-
I aim to evaluate a 8-bit quantized model. for some reason lighteval asks me to provide data for quantization:
ValueError: You need to pass `dataset` in order to quantize your model
I started wi…
-
I tried to get the full int8 quantization by running convert_tflite.py and setting the flag --quantize_mode full_int8. However, I got the following error:
RuntimeError: Quantization not yet support…
-
TFLite uses int8 per-channel weight quantization for transposed convolutions.
While XNNPACK includes a fast transposed convolution operation it only supports per-tensor weight quantization (i.e. a si…
-
Hi,
we are trying to quantise our onnx models to int8 to run on cpu using : https://onnxruntime.ai/docs/performance/model-optimizations/quantization.html#quantization-on-gpu
we are using dynamic …
-
1. 使用环境(environment)
OS: Ubuntu
OS Version: linux
2. Github版本
branch:master
commit(optional): 6dc2d93
3. 详细描述bug 情况 (Describe the bug)
I find out the problem when I use pretrained model…