-
**1.save model**
pipe = pipeline("text-generation", model="/data/Data/LLM/starchat/", torch_dtype=torch.bfloat16, device_map="auto")
bentoml.transformers.save_model(name="starsvc", pipeline=pipe)
*…
-
### Describe the bug
I followed this [document](https://docs.bentoml.org/en/latest/frameworks/diffusers.html#importing-a-pre-trained-model) to try the bentoml. But I got an error when I requesting th…
-
### Feature request
Currently, `agent.run` on main will run materializer from `AgentType` to return its corresponding type.
I think it would be a great addition to just return this `AgentType` d…
-
I'm unsure if this is a bug or a feature, but spacer seems to use CPU wall clock time when determining how long it's waited. I tend to think of this as a bug because it's idea of "time passed" is diff…
-
-
### Describe the bug
I am trying to deploy GPTQ Llama-2 model using OpenLLM. Did not use any configurations, but simply ran the line from the readme documentation.
When tracing the error, it seems…
-
# References
OpenLLM: https://github.com/bentoml/OpenLLM
FastChat pytorch impl: https://github.com/lm-sys/FastChat/blob/main/fastchat/serve/model_worker.py
FastChat vllm impl: https://github.…
-
Extend the "[extract_manifests_images.sh](https://github.com/kubeflow/manifests/blob/v1.7-branch/hack/extract_manifests_images.sh)" script to be able to report both
- images per Working Group (WG)
…
-
### Describe the bug
As title, `openllm download llama meta-llama/Llama-2-70b-chat` attempts to download 7b parameter variant.
### To reproduce
Run `openllm download llama meta-llama/Llama-2-70b-ch…
-
When I run "bentoml serve svc.py:svc.I -p 2000"
I got such error
"2023-07-04T17:56:21+0800 [INFO] [cli] Starting production HTTP BentoServer from "svc.py:svc" listening on http://0.0.0.0:2000 (Pre…