What is the best way to train large models?

awslabs / fast-differential-privacy

Fast, memory-efficient, scalable optimization of deep learning with differential privacy

Apache License 2.0

83 stars 11 forks source link

What is the best way to train large models? #9

Closed Pier297 closed 1 year ago

Pier297 commented 1 year ago

Hi all!

I was wondering how I could use this library with large models that don't fit on a single gpu?

For non-private fine-tuning I managed to get it working with FSDP and also I have put together a rough script that applies DP with FSDP, I have recently also written on the opacus forum here.

Do you know of any better way to do this? because when I try to call attach on the optimizer it throws an error saying that the FSDP layer is not supported, if it's helpful I can provide an MVP of the error.

Thank you a lot for your time, Pier

woodyx218 commented 1 year ago

For large models that do not fit on a single GPU, it may be the easiest for our library to work with DeepSpeed and FSDP. This requires some changes on both the DeepSpeed part and this library. I can share some thoughts here before we actually make the change.

One key observation is that DeepSpeed and fastDP both change the optimizer.step() function, though in different ways, so we want only one library to do the changes. Also have you tried mixed precision?

woodyx218 commented 1 year ago

The best way to train large DP models is by ZeRO (e.g. in FSDP and DeepSpeed). We have developed this in an upcoming paper "Zero redundancy distributed learning with differential privacy" and will open-source the code soon. Stay tuned!