-
# Quantize the model
model_prepared = tq.prepare(model_fused)
model_quantized = tq.convert(model_prepared)
# Define the quantization configuration
quant_config = tq.get_default_qconfig('fbge…
-
Do you have examples for working with a quantized llama3?
I'm trying with
```
from transformers import BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(
load_in_8bit=True,
b…
-
As far as I could see, the quantization methods was not provided in this project.
All examples showed here were how to inference with vptq models, rather than the quantization tutorials.
Or I might…
-
### Your current environment
Collecting environment information...
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Debia…
-
Hello, I encountered some problems when loading the Llama3.2-90B-Vision-Instruct model with FP8. Can you help me take a look?
Version of llama_stack and llama_models:
```
llama_models == 0.0.41
…
-
I'm using torchtune for model quantization with QAT. Currently, I am learning based on https://pytorch.org/torchtune/main/tutorials/qat_finetune.html, but the results of the prepared_model I printed a…
-
**Describe the bug**
I cannot quantize Mobilenetv3 from keras2 because the hard-swish activation fuction is implemented as a TFOpLambda.
**System information**
tensorflow version: 2.17
tf_ke…
-
### What is the issue?
taozhiyu@Mac ~ % ollama run hf-mirror.com/Orenguteng/Llama-3.1-8B-Lexi-Uncensored-V2-GGUF:Q8
pulling manifest
Error: pull model manifest: 400: The specified tag is not a v…
-
### Search before asking
- [X] I have searched the YOLOv8 [issues](https://github.com/ultralytics/ultralytics/issues) and [discussions](https://github.com/ultralytics/ultralytics/discussions) and f…
-
### 1. System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 22.04
- TensorFlow installation (pip package or built from source): pip package
- TensorFlow library (v…