FMInference / FlexLLMGen

Running large language models on a single GPU for throughput-oriented scenarios.
Apache License 2.0
9.18k stars 548 forks source link

Pytorch-Lightning strategy #42

Open k-sparrow opened 1 year ago

k-sparrow commented 1 year ago

Awesome work!

Any plans on having this as a strategy plugin for pytorch-lightning? (like DDP/DeepSpeed/ColossalAI) (https://pytorch-lightning.readthedocs.io/en/stable/advanced/model_parallel.html)

lantiga commented 1 year ago

Hey we'll be happy to help if anyone wants to undertake this, keeping in mind that this is intended to optimize inference.

In that sense it might also be a good strategy for Lightning Fabric, so you can just load the model, set it up with the strategy and call forward /cc @awaelchli

Ying1123 commented 1 year ago

Thanks for your interest! This looks cool but currently, this repo is specialized for some specific transformer models. It cannot be a general solution that works out of the box for any model. After we develop a more general interface and backend, we can think about the integration more seriously. I am happy to help with this as well.