FMInference / FlexLLMGen

Running large language models on a single GPU for throughput-oriented scenarios.
Apache License 2.0
9.18k stars 548 forks source link

Support for LLaMA #104

Closed ustcwhy closed 1 year ago

ustcwhy commented 1 year ago

Thanks for your wonderful work! Meta released their newest LLM, LLaMA. The checkpoint is available on Huggingface[1]. zphang has presented the code to use LLaMA based on the transformers repo. For FlexGen, could I directly replace OPT model with LLaMA to make inferences on a local card? Do you have any plan to support LLaMA in the future?

[1] https://huggingface.co/decapoda-research [2] https://github.com/huggingface/transformers/pull/21955

BarfingLemurs commented 1 year ago

(duplicate) https://github.com/FMInference/FlexGen/issues/60