Using PLATO-XL for inference on 3 or more GPUs

Kaka23333 commented 1 year ago

Hi, thanks for you impressive work.

I'm currently trying to deploy PLATO-XL service on RTX3090. The deployment is successful, however, I'm only able to input no more than 3 rounds as RTX3090 only has 24GB memory. I also try to use args "--mem_efficient true" or change the embedding size to 512, but they are not very useful.

Is there any way to run PLATO-XL on 3 or more GPUs instead of 2? I notice that when setting cuda visible device to 3 in config file(e.g., interact.conf), the script will split the checkpoint to 3. However, I got this error while running interact.sh. Is there any way to solve it?

Looking forward for your reply.

sserdoubleh commented 1 year ago

This may be useful: https://github.com/PaddlePaddle/Knover/issues/159#issuecomment-1265071448

Kaka23333 commented 1 year ago

Thanks for your in time reply!

This solution works for me. The title of this issue is not very clear so I didn't pay attention to it before. 🥲

PaddlePaddle / Knover

Using PLATO-XL for inference on 3 or more GPUs #184