FMInference / FlexLLMGen

Running large language models on a single GPU for throughput-oriented scenarios.
Apache License 2.0
9.18k stars 548 forks source link

What's differents in FlexGen and ColossalAI ? #35

Closed Kuri-su closed 1 year ago

Kuri-su commented 1 year ago

It seems that both FlexGen and ColossalAI can save GPU memory, what's differents?

https://github.com/hpcaitech/ColossalAI

Ying1123 commented 1 year ago

FlexGen focuses on the generative inference of large models and proposes several unique optimizations for high-throughput scenarios. ColossalAI has more features but does not have the optimization FlexGen just introduced. I guess its performance will be similar to Huggingface Accelerate and DeepSpeed Zero-Inference.

Kuri-su commented 1 year ago

oh, thank you! Is it possible to be used FlexGen on Diffusion Model? or it only using on NLP Model?

zhisbug commented 1 year ago

@Kuri-su I think diffusion models do not have a lot of parameters and you can run most diffusion models inference on a single GPU?

Kuri-su commented 1 year ago

of course, I can run most diffusion models on a single GPU, but if the output image size is too large, it will cause a "CUDA out of memory" error. Is it possible to use FlexGEN to optimize this process.

Ying1123 commented 1 year ago

Yes. It is not in our current plan, but you can try to port some strategies from FlexGen to other models.

Kuri-su commented 1 year ago

thank you , i will try it later