-
### Your current environment
```text
The output of `python collect_env.py`
```
### How would you like to use vllm
I use `xinference` to launch model `Qwen1.5-chat`, it use `vllm` in its origin …
-
This is my models Qwen1.5-0.5B the models website is [https://huggingface.co/Qwen/Qwen1.5-0.5B](url)
![图片](https://github.com/airockchip/rknn-llm/assets/85230603/cbfc4e7f-302f-4c67-b0c8-c6c8b5e17ea…
-
rk3588,Qwen1.5-0.5B的量化模型Qwen1.5-0.5B-a8w8.rkllm,前后两次回答不一致;但不量化的话,多次回答结果是一样的。按照RKLLMParam参数配置,每次结果不应该都是一样的吗?
![3](https://github.com/airockchip/rknn-llm/assets/61830993/b6092824-28c7-440c-b0a1-6d0…
-
### Description
模型采用qwen1_5-7b-chat-q4_0.gguf,按照该项目首页的代码启动,有很大概率开头就无限空行回复
-
### What is the issue?
Qwen1.5-MoE-A2.7B-Chat is installed by convert-hf-to-gguf.py according to the process. After 4-bit quantization, ollamamodelfile is created, but it is not supported when loadin…
-
### Your current environment
启动方式:python -m vllm.entrypoints.openai.api_server --model /opt/llm_models/Qwen1.5-32B-Chat-GPTQ-Int4 --quantization gptq --max-model-len 16384 --port 8888 --gpu-memory-ut…
-
### Your current environment
```text
vLLM v0.4.2 (cuda 12.2)
```
### 🐛 Describe the bug
I'm training a Qwen1.5 with unsloth, and it seems that the inference does not work as expected and …
-
Dear authors! I have tried to reproduce your results on the dolly dataset with Qwen1.5 as a teacher and gpt-2 as a student. Unfortunately, my results are differ from yours.
![dolly_exp](https://gith…
-
### Description
qwen1.5和qwen2调用agent方式不一样吗??
### Link
1
-
显卡为4090,系统Ubuntu 22.04
驱动和CUDA
Driver Version: 550.54.15 CUDA Version: 12.4 python 3.10
尝试用chatglm2、chatglm3、qwen1.5均只能输出
![image](https://github.com/ztxz16/fastllm/assets/8828385/bb7e8be8-7…