-
### Feature request
starlette support backgroundTask
https://www.starlette.io/background/
~~~python
# starlette example
from starlette.applications import Starlette
from starlette.response…
-
**Describe the bug**
When I add the `--production` flag to the `bentoml serve` command, model serving becomes extremely slow compared to without the flag. The `--production` flag seems to make many…
-
### Describe the bug
I was trying to utilize `gptq` inference as in the examples in the homepage
```
openllm start falcon --model-id TheBloke/falcon-40b-instruct-GPTQ --quantize gptq --device …
-
Command: openllm start opt
I want to start the server on something other than port 3000. Can I change this port?
-
### Describe the bug
mpt is listed under supported models but not available in openllm build command, this is the error message.
openllm build mpt
Usage: openllm build [OPTIONS] {flan-t5|dolly-v…
-
### Describe the bug
When attempting to use the OpenLLM to run a finetuned model, it fails to download any files with a '.bin' extension (usually model weights).
This issue results in missing file…
-
### Feature request
As it is done with Triton Inference server, it would be great to integrate vLLM (https://github.com/vllm-project/vllm) as a higly optimized engine for LLM generation based on cont…
-
### Describe the bug
`bentoml containerize` fails
The stacktrace:
```
Building OCI-compliant image for vendor_ranking:124f3a586b498bde251a152bfbe73415495273c8 with buildx
Encountered except…
-
### Describe the bug
As recommended as a stopgap measure in [issue 299](https://github.com/bentoml/OpenLLM/issues/299), I installed OpenLLM with the `--no-binary` flag and tried to launch and query…
-
### Describe the bug
Trying to run a LLaMa 13B model and query it, I encounter a type error.
### To reproduce
Installed OpenLLM by running
```bash
pip install "openllm[llama, vllm, fine-tun…