-
How do you load and infer a custom GPTQ quantized Qwen2-VL model (not the default one) using Qwen2VLForConditionalGeneration in **WINDOWS**
I used the following code.
```
from transformers impo…
-
See thread here:
https://github.com/ml-explore/mlx-examples/issues/776
Large Q8 models like Qwen-VL-72B are uselessly slow unless loaded immediately after a fresh boot, even though there is plenty…
-
测试代码都是VLMEvalKit,我只改了api为qwen-vl-max-0809,以及测MME的celebrity,prompt这些都没动,计算scores的方法也没动,为什么和榜单上的差异这么大
-
Great Work! I'm interested in unsloth and may I use it to finetune MLLM like Qwen-Vl?
-
### Your current environment
```text
cuda 12.1 simple pip install vllm
```
### 🐛 Describe the bug
`python benchmarks/benchmark_throughput.py --backend vllm --input-len 1024 --output-len …
-
![Image](https://github.com/user-attachments/assets/459f5917-ac00-449c-8e15-b4bb3d840255)
y-axis is MFU and x-axis is training step.
I'm testing qwen 72b with huggingface trainer and whenever i trai…
-
### The model to consider.
https://modelscope.cn/models/qwen/Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4/
### The closest model vllm already supports.
https://modelscope.cn/models/qwen/Qwen1.5-MoE-A2.7B-Chat/…
-
I'm not exactly clear about what the question-and-answer pairs in the fine-tuning data include. Is it data and formatting similar to the inputs and outputs about LLMs in paper?If you could provide the…
-
When I run:
python llm_export.py --type Qwen-7B-Chat --path /mnt/LLM_Data/Qwen-7B-Chat --export_split --export_token --export_mnn --onnx_path /mnt/LLM_Data/Qwen-7B-Chat-onnx --mnn_path /mnt/LLM_Data/…
-
The whole Qwen model family seems to be pretty inaccurate. I have not done complete benchmarks to determine where the issue is yet. That still needs to be done to fine the specific area causing the er…