-
### Description
Improve validation and exception handling within the inference API.
Here are a few areas to get started
- When a text embedding service is created, during the creation process w…
-
Hello everyone, I'm trying to achieve a goal of using trochrun for dual-card parallel inference. Then I have two questions. First, I found that torchrun is mainly used for model training, so can it be…
-
### 🚀 The feature, motivation and pitch
I insist on trying to learn and study the llama-stack API and its examples, but they are complex. There’s no documentation, or I’m unable to access it.
I wa…
-
**Why is it that when using a quantitative model for inference, the TTFT optimization is not obvious, but the overall inference efficiency is improved a lot? At the same time, the inference efficiency…
-
The stack tool cannot support large models with a .pth extension downloaded from Meta. It throws an error during runtime. Does it have to use models downloaded from Hugging Face? Is this setup unreaso…
-
I used the official Docker image and downloaded the weight file from Meta. The md5sum test proved that the file was fine, but it still failed to run, which left me confused,I confirm that CUDA can be …
-
我是用官方文档的命令启动webui
`python3 webui.py --port 50000 --model_dir pretrained_models/CosyVoice-300M`
启动后界面可以正常打开,如下
上面是我的配置,点击运行后提示错误,终端输出如下:
```
2024-10-28 11:15:36,546 INFO get zero_shot inferenc…
-
downloaded the 1B model from Huggingface and encountered an error while running it. The following is the configuration process, and I am puzzled as to why I need to link it to the address [: ffff: 0.0…
-
### Description
The inference API supports text embedding and rerank task types. If a inference endpoint is created for text embedding, and a request is made to perform inference and the request co…
-
### Describe the issue
according to the [Local-LLMs/](https://microsoft.github.io/autogen/blog/2023/07/14/Local-LLMs/), the autogen can support multiple local llm.
my command for fastchat
First,…