How can I use this project for my own model? Or what are the key lines of code?

FMInference / FlexLLMGen

Running large language models on a single GPU for throughput-oriented scenarios.

Apache License 2.0

9.18k stars 548 forks source link

Closed guotong1988 closed 1 year ago

guotong1988 commented 1 year ago

Thank you very much!

merrymercy commented 1 year ago

You can implement something similar for your own model.