agemagician / ProtTrans

ProtTrans is providing state of the art pretrained language models for proteins. ProtTrans was trained on thousands of GPUs from Summit and hundreds of Google TPUs using Transformers Models.
Academic Free License v3.0
1.1k stars 152 forks source link

Fine tuning with Lora on multi GPU #136

Closed BSharmi closed 10 months ago

BSharmi commented 10 months ago

Hello there! So happy to see that ProtT5 has fine tuning with LoRA. I just had a question that if you ran it on a single GPU or multi-GPU? My data is big and I would love to train it on multiple GPUs efficiently but did not see lot of examples online so figured would ask here.

Thank you!

RSchmirler commented 10 months ago

Hey @BSharmi for now running everything on single GPU. But you are correct on large datasets this takes quite some time. You can use model.parallelize() to split the model and distribute it to multiple GPUs. But this is only a Naive Model Parallel (Vertical) parallelism that T5 models offer. It is only useful when your GPUs are two small to hold the model. In terms of speed it is most likely a little slower than using a single GPU.

If you want to do real data parallel training the best option would likely be deepspeed. It is not possible to launch multi GPU deepspeed from a notebook, so you will have to launch the training as a script using the deepspeed launcher.

I have not done this, so I can not provide detailed instructions. Hope this helps.

BSharmi commented 10 months ago

Thank you! I have been able to run ProtT5 on model and data parallel from Sagemaker. But with QLora there seems to be some issues. Might checkout how deepspeed works.

Thanks again for the info