-
## version
`05/09 21:16:21 - mmengine - INFO - 0.1.18`
## how to reproduce
`CUDA_VISIBLE_DEVICES=4,5,6,7 NPROC_PER_NODE=4 xtuner train qwen1_5_0_5b_chat_qlora_alpaca_e3`
## log
I only change the…
-
https://github.com/QwenLM/Qwen2/issues/259
qwen1.5测出的问题在qwen2仍然存在,出问题的模型应该都用了GQA
-
### 是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
- [X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions
### 该问题是否在FAQ中有解答? | Is there an existing ans…
-
Run Qwen1.5-72B-Chat-GPTQ-Int4 is much slower than Qwen1.5-72B-Chat by transformer package.
Quantited model need load by auto_gptq.
https://github.com/QwenLM/Qwen/blob/main/README_CN.md#%E6%8E%A8%…
-
环境配置
A 环境 cuda12.1 v0.2.0
B 环境 cuda11.8 v0.1.13
硬件
A800单卡测试
模型 qwen14B
单卡加载 int8推理 环境变量如下配置
export CUDA_VISIBLE_DEVICES=1
export MODEL_TYPE=qwen_2
export ACT_TYPE=BF16
export WEIGHT_TYPE=…
-
### The model to consider.
https://modelscope.cn/models/qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4/
### The closest model vllm already supports.
https://modelscope.cn/models/qwen/Qwen1.5-MoE-A2.7B-Chat/…
-
大佬能不能出一个最简化的 全量SFT QWEN1.5的 代码呀。
-
### Discussed in https://github.com/Mozilla-Ocho/llamafile/discussions/418
Originally posted by **fabiomatricardi** May 15, 2024
Ciao,
I tried to ask in the Discord channel but I get no repli…
-
输入参数为:
```
{
"model": "Qwen1_5_72B_Chat",
"messages": [{"role": "user","content": "请给出一篇500字的中学作文,讲述海边游玩的经历"}],
"max_tokens": 2000,
"stop": []
}
```
也尝试了不同的参数,但是不管输出内容长短 都会截断,导…
-
I used auto_gptq 0.7.1 and run this code:
python quant_with_alpaca.py --pretrained_model_dir Qwen1.5-14B-Chat --quantized_model_dir Qwen1.5-14B-Chat_4bit --use_triton --save_and_reload --trust_remote…