Distributed inference/demo checkpoints

Hi! Thank you for the great contribution. I am trying to run the demo, my setup is composed of 4x2080Ti 12GB GPUs, so I cannot run the model on a single card (It takes ~16GB as far as I know). The checkpoint is not distributed but the model class uses fairscale distributed modules so I haven't found a way to load the state dict on more than 1 GPU. Am I missing something? If not, would you release distributed checkpoints and/or distributed inference scripts? Thanks!!

csuhan / OneLLM

Distributed inference/demo checkpoints #26