-
I have used the CTranslate2 inference within bentoML (https://github.com/OpenNMT/CTranslate2) with custom runner.
But, when trying to save the model with cloudpickle, it results in the following er…
-
openllm start chatglm
......
File "/opt/conda/lib/python3.10/site-packages/accelerate/big_modeling.py", line 108, in register_empty_parameter
module._parameters[name] = param_cls(module._…
-
### Feature request
Right now we don't support parsing space
```python
bentoml models delete model1 model2 ...
```
will fail.
Note that we can already do
```python
bentoml models delet…
-
### Is your feature request related to a problem? Please describe.
Managing dependencies through conda does not always provide the right level of isolation. We need to have each model containerized a…
-
### Describe the bug
what happened?
Error caught while starting LLM Server:
environment can only contain strings
### To reproduce
_No response_
### Logs
```shell
Traceback (most recent call l…
-
## Summary
With the new update of Oneflow Nightly this issue has started to occur. With the same code base it was working really fine with previous update.
## Code to reproduce bug
`imp…
-
### Feature request
As it is done with Triton Inference server, it would be great to integrate vLLM (https://github.com/vllm-project/vllm) as a higly optimized engine for LLM generation based on cont…
-
### Describe the bug
On my local machine, I am using node (axios) to make POST requests to a locally running bentoml server, each request fails with a connection refused. **However, I can make requ…
-
There’s a lot of different LLM deployment providers. How do I easily replace my OpenAI base with their url as a proxy? - https://github.com/petals-infra/chat.petals.dev/issues/20, https://www.banana.d…
-
### Describe the bug
First at all: Thank you very much, openllm looks awesome so far 💯
This issue is regarding to #47. We tried to start an openllm server with the command:
`openllm sta…