FMInference / FlexLLMGen

Running large language models on a single GPU for throughput-oriented scenarios.
Apache License 2.0
9.18k stars 548 forks source link

May i ask why the model is hard coded as "facebook/opt-30b"? #63

Closed AISuperMa closed 1 year ago

AISuperMa commented 1 year ago

https://github.com/FMInference/FlexGen/blob/25438f6bc3507e1fbd6f88e4812beb2a102d7315/flexgen/flex_opt.py#L1162

merrymercy commented 1 year ago

I think all OPT models share the same tokenizer.