-
Fp8 or AWQ quant
-
### from the new version,I build it but I cant import awq,
- transformers 4.43.3
- torch 2.3.1
- torchaudio 2.4.0
- torchvision 0.19.0
…
-
Why speed does not increase with AWQ? I have a model gemma 2 9B. With one A100.
with float16 the benchmark is 4267.62 tokens per second
with awq 4 bit the benchmark is 4963.73 tokens per second
I ex…
-
### Your current environment
Collecting environment information...
WARNING 11-12 05:39:35 _custom_ops.py:19] Failed to import from vllm._C with ModuleNotFoundError("No module named 'vllm._C'")
Warn…
-
### Your current environment
The output of `python collect_env.py`
```text
Your output of `python collect_env.py` here
```
### Model Input Dumps
from awq import AutoAWQForCausalLM
fro…
-
### System Info
ubuntu 20.04
tensorrt 10.0.1
tensorrt-cu12 10.0.1
tensorrt-cu12-bindings 10.0.1
tensorrt-cu12-libs 10.0.1
tensorrt-llm 0.11.0.dev2024052100
nvidia L40s
### Who can help?
…
-
### Checklist
- [X] 1. I have searched related issues but cannot get the expected help.
- [X] 2. The bug has not been fixed in the latest version.
- [X] 3. Please note that if the bug-related iss…
-
### System Info
```shell
Transformers fails with the following error, when trying to use AWQ with TGI / neural compression enginer, or optimum habana
ValueError: AWQ is only available on GPU
```
#…
-
### System Info
x86_64, Debian 11, L4 GPU
### Who can help?
_No response_
### Information
- [x] The official example scripts
- [ ] My own modified scripts
### Tasks
- [ ] An officially supporte…
-
File "/root/ld/ld_project/pull_request/MiniCPM-V/web_demo_2.6.py", line 44, in
model = AutoModel.from_pretrained(model_path, trust_remote_code=True)
File "/root/ld/conda/envs/minicpm/lib/py…