-
My use case and GPU:
model: Qwen2-72B-Instruct
max_token_len (input+output): 20000
gpus: 4xA100
when I use code from https://github.com/casper-hansen/AutoAWQ/blob/main/docs/example…
-
--chip bm1684x 可以通过,cv183x 报错
==---------------------------==
GmemAllocator use FitFirstAssign
[Success]: tpuc-opt yolov5s_cv183x_f16_tpu.mlir --mlir-disable-threading --strip-io-quant="quant_inpu…
-
Hello,
While implementing a BitBlas Linear layer, I noticed something weird
```
import bitblas
from bitblas.cache import global_operator_cache, get_database_path
from bitblas import auto_detect…
-
Hi @casper-hansen, I keep getting this error when trying to quantize my custom LLaVA model:
```Traceback (most recent call last):
File "/mainfs/lyceum/kzl1m20/LLaVA/quant.py", line 9, in
m…
-
Dear maintainers, greetings from [CommandDash](https://commanddash.io)!
We are a tool to turn docs and examples of your library into a code generation AI agent which **helps devs directly generate…
-
Thanks for sharing work for LLM quantization & onnx export.
I follow the script in '[Convert to onnx model](https://github.com/wejoncy/QLLM?tab=readme-ov-file#convert-to-onnx-model)' section, and g…
-
@nbasyl Sorry, When i use the follwing scripts, the quantification time is over ten days:
```
MODEL_ADDR=huggyllama/llama-7b
HF_ENDPOINT=https://hf-mirror.com export CUDA_VISIBLE_DEVICES=0,1,2,3
…
-
```
save:
save_trans: True
save_lightllm : False
save_fake: False
save_path: /extra_data/mali36/llmc/models/
```
when I used the above config, I get a 16G model,
when I used …
-
awq config
```
base:
seed: &seed 42
model:
type: Qwen2
path: /models/Qwen2-7B-Instruct
tokenizer_mode: slow
torch_dtype: auto
calib:
name: pileval
download: Fals…
-
Focus - benchmarking, documentation, tutorials, prototype to beta
Due date: June 13 2024
### Spillover [from 0.2.0](https://github.com/pytorch/ao/issues/132)
- [x] Consolidating workflows to …