How to inference/translate on mutli-GPUs

ZNLP / BigTranslate

BigTranslate: Augmenting Large Language Models with Multilingual Translation Capability over 100 Languages

217 stars 12 forks source link

How to inference/translate on mutli-GPUs #2

Closed GetUpAt8 closed 5 days ago

GetUpAt8 commented 1 year ago

Hi,

It's a great job, thanks for your contribution to LLM. I've try to use your model before&after the update [June 06,2023], and I wonder how to use mutli-GPUs when inference?

In the new "translate.sh" , there are _"export CUDA_VISIBLEDEVICES=" . I set the value '3,4' , but it still runs on the single GPU 3.

james-yw commented 1 year ago

Hi, @GetUpAt8 Thanks for your question! I regret to tell you that currently our project does not implement the parallel operation of the model on the GPU in the inference stage. We can provide some ideas for implementing model parallel computing in the inference stage. You can use the pipeline to divide each layer into different GPUs in parallel. For implementation, you can refer to Gpipe, or use zero-infinity to load the model to the CPU, and then transfer each layer to the GPU for inference.

Hope it can help you!

james-yw commented 1 year ago

Hi, @GetUpAt8 We have implemented parallel inference on multiple GPUs by utilizing tensor_parallel. You can achieve parallel inference through the latest code of the repository.

I wish it could help you！

GetUpAt8 commented 1 year ago

Thank u so much for your replies and update in parallel inference! I'll try it.