-
I'm trying to make the model generate emojis using this command:
```
./run.sh $(./autotag local_llm) python3 -m local_llm.chat --api=mlc --model=NousResearch/Llama-2-7b-chat-hf --prompt="Repeat th…
-
I use the code in TensorRT-LLM/examples/baichuan/build.py to compile the Baichuan model with the option of --use_inflight_batching, then I deploy the compiled model using TensorRT-LLM inference servic…
-
I have noticed a very weird change when I wanted to make use of streaming. Before I was not, and basically all conversation models tended to start their message with an emoji. The reasons why the mode…
-
Upon triggering genai-perf, streaming option is always enabled while making calling for triton service.
Even without the --streaming flag,
```
genai-perf \
-m bls\
…
-
![image](https://github.com/THUDM/VisualGLM-6B/assets/133836090/98e6fc19-2041-44d9-b6be-60e2536218bd)
一秒一张图片
-
0%| | 0/600000 [00:00
-
- [ ] [codefuse-chatbot/README_en.md at main · codefuse-ai/codefuse-chatbot](https://github.com/codefuse-ai/codefuse-chatbot/blob/main/README_en.md?plain=1)
# codefuse-chatbot/README_en.md at main ·…
-
I am trying to use RIVA ASR with frontend as given in example, it fails to transcribe speech to text. Most of the time it fails catch my voice correctly.
-
### What happened + What you expected to happen
I am trying to load a quantized large model with vLLM. It is able to start the model loading, but it sometimes will stop loading the model and return…
-
### System Info / 系統信息
python 3.11.8
### Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
- [ ] docker / docker
- [X] pip install / 通过 pip install 安装
- [ ] installation from source / 从源…