-
I am getting "float division by zero" error whenever I try to quantize mixtral related models with autogptq,
and here is my code.
```
from transformers import AutoTokenizer, TextGenerationPipeli…
-
I want to speed up inference of `codeformer.pth` model, how can i do optimization and quantization this?
-
Hi,
I am trying to apply the generate recipe on a quantized llama 3.1 8B model but run into the following error:
```
...
File "/home/mreso/torchtune/torchtune/modules/attention.py", line 211, …
mreso updated
3 months ago
-
I'm wondering if I can get an easier pipeline by loading the awq weights with vllm:
```
from vllm import LLM, SamplingParams
prompts = [
"Hello, my name is",
"The president of the Uni…
-
I dumped the quantised llama-3-8B model from LMQuant, using QoQ, the command as follows written in
[lmquant](https://github.com/mit-han-lab/lmquant/tree/main)/[projects](https://github.com/mit-han-l…
-
Traceback (most recent call last):
File "train_mobilenetv2_quantization.py", line 368, in
base_model = quantize_model(base_model)
File "/home/huangfei/anaconda3/envs/ImageSearch2/lib/pytho…
-
Hi, when I am trying Quantization Aware Training on my model, I get the following error in my 'CustomLayerMaxPooling1D' :
---------------------------------------------------------------------------
…
-
Hi! When i run 'python quant.py --quant_mode test --subset_len 1 --batch_size 1 --deploy
',i get this error:
[VAIQ_NOTE]: =>Quantizable module is generated.(quantize_result/Model.py)
[VAIQ_NOTE]:…
-
### 🐛 Describe the bug
When a user tries to use `convert_fx` on a model which is on cuda, the error message doesn't make sense. We should either throw an error message which asks the user to move th…
-
I'm on an Apple Silicon Mac trying to convert a CoreML model for `large-v3-turbo-q5_0`.
What is needed in order to convert this model?
```
./models/generate-coreml-model.sh large-v3-turbo-q5_0
…