PaddlePaddle / Knover

Large-scale open domain KNOwledge grounded conVERsation system based on PaddlePaddle
Apache License 2.0
674 stars 131 forks source link

Using PLATO-XL for inference on 3 or more GPUs #184

Closed Kaka23333 closed 1 year ago

Kaka23333 commented 1 year ago

Hi, thanks for you impressive work.

I'm currently trying to deploy PLATO-XL service on RTX3090. The deployment is successful, however, I'm only able to input no more than 3 rounds as RTX3090 only has 24GB memory. I also try to use args "--mem_efficient true" or change the embedding size to 512, but they are not very useful.

Is there any way to run PLATO-XL on 3 or more GPUs instead of 2? I notice that when setting cuda visible device to 3 in config file(e.g., interact.conf), the script will split the checkpoint to 3. However, I got this error while running interact.sh. Is there any way to solve it?

Looking forward for your reply.

截屏2023-04-21 15 06 10
sserdoubleh commented 1 year ago

This may be useful: https://github.com/PaddlePaddle/Knover/issues/159#issuecomment-1265071448

Kaka23333 commented 1 year ago

Thanks for your in time reply!

This solution works for me. The title of this issue is not very clear so I didn't pay attention to it before. 🥲