-
Fp8 or AWQ quant
-
I have a question related to below line of code in the notebook,
`model_name = "TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T"`
Question - Could you let me know, why this model is used for…
-
### 1. System information
- OS Platform and Distribution: Ubuntu 22.04
- TensorFlow installation (pip package or built from source): pip
- TensorFlow library (version, if pip package or github SH…
-
pytorch_quantization is supporting 4bit, ONNX is supporting 4bit, but torch.onnx.export is not support 4bit. How to make 4bit pytorch_quantization .pt model export to .engine model?
-
### 1. System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04):Ubuntu 22.04.3 LTS
- TensorFlow installation (pip package or built from source):pip
- TensorFlow library (versi…
-
### Issue type
Bug
### Have you reproduced the bug with TensorFlow Nightly?
No
### Source
source
### TensorFlow version
2.12
### Custom code
Yes
### OS platform and distr…
-
Hi all,
We recently developed a fully open-source quantization method called VPTQ (Vector Post-Training Quantization) [https://github.com/microsoft/VPTQ](https://github.com/microsoft/VPTQ) which en…
-
### Description of the bug:
I'm trying to convert the following (quantized) model:
```python
# Disable GPU for model conversion to tflite.
# Fix for https://github.com/google-ai-edge/ai-edge…
-
**Describe the bug**
I'm compressing a qwen2.5_7b model using `examples/quantization_2of4_sparse_w4a16/llama7b_sparse_w4a16.py`, but I failed to load the stage_sparsity model. The error is shown belo…
-
I fine-tuned a Whisper large-v3 model via [speechbrain](https://github.com/speechbrain/speechbrain) framework. I want to convert it to `faster-whisper` model and run inference on it via `faster-whispe…