FMInference / FlexLLMGen

Running large language models on a single GPU for throughput-oriented scenarios.
Apache License 2.0
9.2k stars 547 forks source link

Implement RESTful API of FlexGen #130

Open Fyphen1223 opened 10 months ago

Fyphen1223 commented 10 months ago

I really want the RESTful API of this project. I wanna implement it by myself but it is nearly impossible for me :(