-
I want to generate multiple output from single prompt. is there any way i can have multiple generation from fine-tuned llama3.1 model similar to what `num_return_sequences` do in huggingface?
# Hug…
-
### Have you searched existing issues? 🔎
- [X] I have searched and found no existing issues
### Desribe the bug
After running `reduce_outliers` and `update_topics`, the effects of all specif…
shj37 updated
1 month ago
-
➜ seamless_communication git:(main) ✗ m4t_predict hello --task T2TT --tgt_lang eng --src_lang cmn
usage: m4t_predict [-h] [--task {ASR,S2ST,S2TT}] [--tgt_lang TGT_LANG]
[--src_la…
-
环境:ubuntu下运行docker镜像 - funasr:funasr-runtime-sdk-cpu-0.4.5
运行命令:bash run_server.sh \ --download-model-dir /workspace/models \ --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-pytorch \ --model-dir dam…
-
### 🚀 The feature, motivation and pitch
I am looking to assess the performance of vllm for speculative decode, but I have been unable to find an offline benchmark script similar to [benchmark_latency…
-
I know, there are lots of `hrefs` and `[=...=]` to change as well as the visible text. And probably some number of other documents that we cannot change (now, if not ever), but which can be supported …
-
Is there a way to improve the results and also how much compute power do we need to fine tune this model to get good results on specific documents. And am I correct on that this model with ocr_type se…
-
### Your current environment
VLLM is 0.5.0,A100 , CUDA 12.1
### 🐛 Describe the bug
1、
CUDA_VISIBLE_DEVICES=1 python -m vllm.entrypoints.openai.api_server \
--model /home/Qwen1.5-1.8B-Chat \
…
-
### Your current environment
The startup command is as follows: it initiates both a standard 7B model and an n-gram speculate model. Speed tests discover that the speculate model performs more slowl…
-
after ctranslate2.models.Whisper.generate, result does not include "logits"
version == 4.4.0