Closed Kuri-su closed 1 year ago
FlexGen focuses on the generative inference of large models and proposes several unique optimizations for high-throughput scenarios. ColossalAI has more features but does not have the optimization FlexGen just introduced. I guess its performance will be similar to Huggingface Accelerate and DeepSpeed Zero-Inference.
oh, thank you! Is it possible to be used FlexGen on Diffusion Model? or it only using on NLP Model?
@Kuri-su I think diffusion models do not have a lot of parameters and you can run most diffusion models inference on a single GPU?
of course, I can run most diffusion models on a single GPU, but if the output image size is too large, it will cause a "CUDA out of memory" error. Is it possible to use FlexGEN to optimize this process.
Yes. It is not in our current plan, but you can try to port some strategies from FlexGen to other models.
thank you , i will try it later
It seems that both FlexGen and ColossalAI can save GPU memory, what's differents?
https://github.com/hpcaitech/ColossalAI