-
### System Info / 系統信息
accelerate 0.33.0
aiofiles 23.2.1
annotated-types 0.7.0
anyio …
-
### Checklist
- [X] 1. I have searched related issues but cannot get the expected help.
- [X] 2. The bug has not been fixed in the latest version.
- [X] 3. Please note that if the bug-related issue y…
-
`CUDA_VISIBLE_DEVICES=0 swift sft --model_type glm4v-9b-chat --model_id_or_path /content/glm-4v-9b-4-bits --dataset /content/drive/MyDrive/glm/training_data.jsonl --output_dir /content/drive/MyDrive/g…
-
### Your current environment
```text
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.4 LTS (x86_64)
GCC …
-
### System Info
```Shell
- `Accelerate` version: 0.30.0
- Platform: Linux-6.5.0-27-generic-x86_64-with-glibc2.35
- `accelerate` bash location: /data/envs/tt/bin/accelerate
- Python version: 3.10.1…
-
**Describe the bug**
Hello the vLLM team, thank you for your outstanding work. I think llm-compressor is really filling a need : a one simple unified quant franework for vLLM.
So the bug I am enc…
-
# RFC: Float8 Inference
- status: draft
## Objective
We want to provide an easy mechanism to utilize FP8 in inference, and see both decreased memory usage and performance gains on hardware that…
-
**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
**Describe the solution you'd like**…
-
Hi !
I quantized DeepSeek-Coder-V2-Lite-Instruct to FP8 using AutoFP8 but when I try to run it with vLLM I get the following error :
**RuntimeError: "cat_cuda" not implemented for 'Float8_e4m3fn…
-
Traceback (most recent call last):
File "/home/admin/workspace/aop_lab/app_source/run_gptq.py", line 89, in
model = AutoGPTQForCausalLM.from_pretrained(args.model_name_or_path, quantize_confi…