-
每次一运行`python -m mlx_lm.fuse --model models/Qwen1.5-32B-Chat --save-path models/Qwen1.5-32B-Chat-FT --adapter-path models/Qwen1.5-32B-Chat-Adapters` 就出现下面的报错:
```
Loading pretrained model
[1] segmen…
-
### Describe the bug
Image upgrade to 0.12.1, running Qwen1.5-14B-Chat-GPTQ-Int4 is much slower compared to 0.11.0.
### To Reproduce
docker image has been upgraded to 0.12.1, which is much slower…
-
运行python run_server.py --llm Qwen1.5-7B-Chat --model_server http://localhost:8000/v1 --api_key EMPTY,报错:
python run_server.py --llm Qwen1.5-7B-Chat --model_server http://localhost:8000/v1 --api_key E…
-
使用A100-32卡,xtuner训练qwen1.5-32b长序列,有下面几个疑问想请教一下:
1. qwen1.5貌似不支持use_varlen_attn=True参数,这个功能后面有计划增加吗?
2. 对于特定模型来说,use_varlen_attn=True与use_varlen_attn=False,是否有明显差别?
3. 在scheduler设置为按照step更新时,有没有公式可以…
-
Run vllm serving test on ARC with below issue:
NFO 07-04 19:10:08 async_llm_engine.py:152] Aborted request cmpl-e5fb5cad96e9402dabbbece3611ae22f-0.
INFO: 127.0.0.1:41772 - "POST /v1/completions …
-
### Prerequisite
- [X] I have searched [Issues](https://github.com/open-compass/opencompass/issues/) and [Discussions](https://github.com/open-compass/opencompass/discussions) but cannot get the ex…
-
`git clone --depth 1 --single-branch https://huggingface.co/Qwen/Qwen1.5-4B-Chat-GPTQ-Int4
`
```
INFO:hf-to-gguf:Loading model: Qwen1.5-4B-Chat-GPTQ-Int4
INFO:gguf.gguf_writer:gguf: This GGUF f…
-
使用xinference launch -u qwen1.5-7b-local -n qwen1.5-7b-local -s 7 -f pytorch --n-gpu 2 --gpu_memory_utilization 0.6 命令多卡启动qwen1.5-7b模型的时候报错
(xinference) skytech@skymachine:~/llm/llm-chat/xinference…
-
```
>>> mii.pipeline("Qwen/Qwen1.5-14B-Chat", quantization_mode='wf6af16')
Fetching 14 files: 100%|███████████████████████████████████████████████████████████████████████████| 14/14 [00:00
-
Faced OOM on Arc with 6k input/512 out with VLLM serving, Mode: ChatGLM3-bB, Qwen1.5-32B on 4 ARC