Unnecessary GPU for serving FastAPI

567-labs / fastllm

A collection of LLM services you can self host via docker or modal labs to support your applications development

MIT License

182 stars 23 forks source link

Unnecessary GPU for serving FastAPI #8

Closed asselinpaul closed 1 year ago

asselinpaul commented 1 year ago

Have been playing around with this and believe don't need a GPU here / will save money by only charging you for GPU use when running the model (not when serving the app)

https://github.com/jxnl/fastllm/blob/5174e711f21c2342d9b39a363b106532e9e15f08/applications/vllm-struct/main.py#L120