-
### Feature request / 功能建议
xinference部署的qwen模型怎样在后台看到request请求的日志及问的内容
### Motivation / 动机
xinference部署的qwen模型怎样在后台看到request请求的日志及问的内容
### Your contribution / 您的贡献
xinference部署的qwen模型怎样在后台看到reque…
-
Hi~
I have made a test on qwen-14b-chat, and very confused about the results.
The results below show the original version fp16 is faster than the awq int4 version
Is this expected ?
thank you
…
-
### Describe your problem
[Question]: Hello everyone, I have just accidentally deleted the Tongue Qwen API and all my embedding works and inference works could not be proceeded. I’m using local LLM …
-
Hi there,
I was struggling on how to implement quantization on autoawq as you mentioned in home page. I was trying to quantize 7b qwen2 vl but no matter I use 2 A100 80Gb vram, I still get cuda oom…
-
Current attempt:
```python
def test_unsloth_vllm(
max_length: int = 8192,
use_4bit: bool = False,
):
print('----> test_unsloth_vllm')
import os
from tra…
-
Hope to support the API style for Chat (friendly to multi-turn conversations), currently it seems to be nearly supporting generate.
For example, in a similar way to the code example like [Qwen/Qwen…
-
I have been experimenting with different models in fllama, specifically Gemma, Phi3, and QWEN 2. I noticed significant differences in the performance and response quality across these models:
Gemma…
-
在WSL下command:
python -m examples.zeroshot --bench_name classification_public --model_name "Qwen/Qwen2.5-7B-Instruct" --device cuda --output_path /tmp/output.csv
和
python -m examples.self_streamicl …
-
qwen# python convert_checkpoint.py --model_dir /code/tensorrt-llm/Qwen1.5-32B-Chat/ --output_dir ./trt_ckpt/qwen1.5-32b/fp16 --dtype float16 --tp_size 4
[TensorRT-LLM] TensorRT-LLM version: 0.11.0.de…
-
Can Spring AI support the Qwen large language model?
And can spring-ai-ollama-spring-boot-starter support function calling?