567-labs / fastllm

A collection of LLM services you can self host via docker or modal labs to support your applications development
MIT License
182 stars 23 forks source link

Unnecessary GPU for serving FastAPI #8

Closed asselinpaul closed 1 year ago

asselinpaul commented 1 year ago

Have been playing around with this and believe don't need a GPU here / will save money by only charging you for GPU use when running the model (not when serving the app)

https://github.com/jxnl/fastllm/blob/5174e711f21c2342d9b39a363b106532e9e15f08/applications/vllm-struct/main.py#L120