-
### System Info
```Shell
- `Accelerate` version: 0.31.0
- Platform: Linux-5.15.0-79-generic-x86_64-with-glibc2.35
- `accelerate` bash location: /home/lstein/test_ckpts/SD3/.venv/bin/accelerate
…
-
I see that there is full int8 support (both weights and activations) for BERT, its not clear to me what is supported for GPT models ([here](https://github.com/NVIDIA/FasterTransformer/blob/main/exampl…
-
Hi, here is the INC team from Intel. Thank you for developing this amazing project.
### Motivation
Our team has developed a new weight-only quantization algorithm called Auto-Round. It has achie…
-
As many lovers of local LLMs know, their raw (fp16) weights are hard to set up on a consumer PC. Luckily, there are some techniques enabling to quantize the weights to 4bits and even lower, making the…
-
Hi,
without using transformers / accelerate blablabla, what are the constraints on the model to be tensor paralelizable ?
does it need to be a nn.Sequential ? does input dimensions need to be alwa…
-
### System Info
- `transformers` version: 4.44.0
- Platform: Linux-6.5.0-44-generic-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.24.5
- Safetensors version: 0.4.…
-
### System Info
```
transformers==4.42.3
torch==2.3.0
numpy==1.26.4
gguf==0.6.0
```
### Who can help?
@SunMarc
### Information
- [ ] The official example scripts
- [X] My own mod…
-
Hi.
I'd like to quantize [SSD](https://github.com/YutaroOgawa/pytorch_advanced/blob/master/2_objectdetection/utils/ssd_model.py) by using [xilinx/vitis-ai-pytorch-cpu:ubuntu2004-3.0.0.106](https://hu…
-
# Design Strategy for Quantization State Persistence
## Introduction
This document outlines the design for storing the quantization state. It includes essential information on the thresholds require…
-
Hey,
I went by the readme and did the install (into a venv) and a make, but upon starting the webui the execution stops and returns a python interpreter prompt without getting to gradio or loading …