FMInference / FlexLLMGen

Running large language models on a single GPU for throughput-oriented scenarios.
Apache License 2.0
9.18k stars 548 forks source link

How can I use this project for my own model? Or what are the key lines of code? #13

Closed guotong1988 closed 1 year ago

guotong1988 commented 1 year ago

Thank you very much!

merrymercy commented 1 year ago

This is the key file https://github.com/FMInference/FlexGen/blob/9d092d848f106cd9eaf305c12ef3590f7bcb0277/flexgen/flex_opt.py#L582.

You can implement something similar for your own model.