Closed Pier297 closed 1 year ago
For large models that do not fit on a single GPU, it may be the easiest for our library to work with DeepSpeed and FSDP. This requires some changes on both the DeepSpeed part and this library. I can share some thoughts here before we actually make the change.
One key observation is that DeepSpeed and fastDP both change the optimizer.step()
function, though in different ways, so we want only one library to do the changes. Also have you tried mixed precision?
The best way to train large DP models is by ZeRO (e.g. in FSDP and DeepSpeed). We have developed this in an upcoming paper "Zero redundancy distributed learning with differential privacy" and will open-source the code soon. Stay tuned!
Hi all!
I was wondering how I could use this library with large models that don't fit on a single gpu?
For non-private fine-tuning I managed to get it working with FSDP and also I have put together a rough script that applies DP with FSDP, I have recently also written on the opacus forum here.
Do you know of any better way to do this? because when I try to call
attach
on the optimizer it throws an error saying that the FSDP layer is not supported, if it's helpful I can provide an MVP of the error.Thank you a lot for your time, Pier