-
### The bug
This is original image files.
![thumbnail (1)](https://github.com/user-attachments/assets/d0d332c4-d6d6-4f4c-980c-4506d10e4613)
![thumbnail](https://github.com/user-attachments/as…
-
I found that you had added a new feature( publish to inference server) to digits6.1, however I don't know how to use it.
-
Are there docs on best practices for using vllm hosted models?
I create a model using
python -m vllm.entrypoints.openai.api_server --model model_path
and try running it as
lm_eval --model lo…
-
Congrats on great project! I started playing with it and have two questions so far:
1) Does it support different sessions and several users?
2) Does it support simultaneous requests for inference?…
-
Type: Performance Issue
I wanted to compare two large files, it didn't work (might been too large/complex?). I also tried switching to non-advanced diff algorithm (`diffEditor.diffAlgorithm` setting…
-
I tested `tritonclient:2.43.0` on Ubuntu:22.04 with `grpcio:1.62.1` and was confronted with a memory leak. Example for reproduction:
```
import asyncio
from tritonclient.grpc.aio import Inferen…
-
#### Description
I am currently working on deploying the Seamless M4T model for text-to-text translation on a Triton server. I have successfully exported the `text.encoder` to ONNX and traced it …
-
I have downloaded LLAMA 3.2 1B Model from Hugging face with optimum-cli
optimum-cli export openvino --model meta-llama/Llama-3.2-1B-Instruct llama3.2-1b/1
Below are files downloaded
!…
-
Hello and thanks for a great project!
I wondered whether there's any interest in support for type inference at the decorator level rather than by changing the return type of methods?
Right now I…
-
- [ ] [optillm/README.md at main · codelion/optillm](https://github.com/codelion/optillm/blob/main/README.md?plain=1)
# optillm
optillm is an OpenAI API compatible optimizing inference proxy whi…