bentoml / OpenLLM

Run any open-source LLMs, such as Llama, Mistral, as OpenAI compatible API endpoint in the cloud.
https://bentoml.com
Apache License 2.0
10.16k stars 642 forks source link

feat: Apple M1/M2 support through MPS #43

Closed ChristianWeyer closed 1 year ago

ChristianWeyer commented 1 year ago

Feature request

I want to use OpenLLM with available models to run on Apple M1/M2 processors (GPU support) through MPS.

Today:

openllm start falcon
No GPU available, therefore this command is disabled

Motivation

No response

Other

No response

aarnphm commented 1 year ago

I'm currently disabling falcon on MPS since I would just run out of memory to try even run the model on Mac

aarnphm commented 1 year ago

Not sure if this is valid any more. I have since tested a lot with pytorch on MPS, and it is often slower. Will probably investigate mlc vs. gguf for this.