-
Hi there! I'm trying to serve multiple TensorRT-LLM models and I'm wondering what the recommended approach is. I'm using Python to serve TensorRT-LLM models. I've tried / considered:
- `GenerationS…
-
### System Info
"@huggingface/transformers": "^3.0.0-alpha.5"
### Environment/Platform
- [X] Website/web-app
- [X] Browser extension
- [ ] Server-side (e.g., Node.js, Deno, Bun)
- [ ] Des…
-
问题:通过webui.py运行,推理模式选择预训练音色,点击生成音频报错,服务端显示:RuntimeError: "addmm_impl_cpu_" not implemented for 'Half',具体报错信息如下:
2024-09-27 16:32:01,942 INFO get sft inference request
tn 我是通义实验室语音团队全新推出的生成式语音大模型,提…
-
Traceback (most recent call last):
File "/home/hirpa/fzq/2024/code/functionary/server_vllm.py", line 39, in
from functionary.vllm_inference import process_chat_completion
File "/home/hirpa…
-
Dear developers,
I decided to clone the code/demo but apparently the data files are not present on the LFS servers. I get the following error during clone:
```
Downloading Inference/db/imdb_raw…
-
### Motivation
I found that the input token logprob is supported by Offline Inference Pipeline, as mentioned in [doc](https://lmdeploy.readthedocs.io/en/latest/inference/vl_pipeline.html#calculate-lo…
-
### The bug
I am just looking at my logs because of an issue I am having with facial recognition, these errors are unrelated as they happened during the night, but I wanted to draw some attention to …
-
### Is your enhancement related to a problem? Please describe
Seems openweb-ui can be integrated through a container, would be good to prototype
### Describe the solution you'd like
Replace current…
-
**Describe the bug**
After update 0.3.21
Getting -->
2024-07-27 13:34:07,646 - MemGPT.memgpt.server.server - DEBUG - Starting agent step
/MemGPT/memgpt/data_types.py:92: UserWarning: Failed to…
-
Triton inference server:r24.07 and model_analyzer:1.42.0
config.pbtxt
```
backend: "python"
max_batch_size: 32
input [
{
name: "IN0"
data_type: TYPE_STRING
dims: [ 16 ]
}
]…