[FSDP][benchmarks] Add regression benchmarks for the FSDP API.

anj-s commented 3 years ago

🐛 Bug

We need to add regression benchmarks for the FSDP API and possible input combinations. These regression benchmarks should be added to fairscale/benchmarks.

anj-s commented 3 years ago

@myleott @sshleifer Would either of you have cycles to add regression benchmarks for FSDP? We maintain regression benchmarks for all FairScale APIs that are run at every commit.

cc @min-xu-ai

sshleifer commented 3 years ago

How long does it take? We are very low on cycles but this sounds incredibly cool.

anj-s commented 3 years ago

It should not take very long. We have an example for OSS and SDP here. Let me know if this sounds feasible. Thanks!

min-xu-ai commented 3 years ago

Hi @tmarkstrum, this is probably a good bug for you to have while you work on benchmarking?

FYI, @anj-s, Tingting and I met this morning and we think the benchmarking of FSDP can be structured in at least the following dimensions:

model size: small (both DDP and FSDP simple wrap won't OOM), medium (DDP OOM, FSDP simple wrap won't OOM), large (DDP OOM, FSDP simple wrap OOM but FSDP with nested wrapping won't OOM)
batch size: same batch size between DDP and FSDP; max batch size for both DDP and FSDP after the memory saving
features: FSDP simple wrap (replace DDP() with FSDP()); FSDP nested wrap (layer by layer, need tuning); FSDP mixed precision

Let Tingting and me know if we have missed anything please. Thanks a lot!

anj-s commented 3 years ago

Thanks @min-xu-ai for the detailed breakdown! this looks like a great start.

Will we be testing cpu_offload as one of the features? It will probably enable the maximum memory savings and would be good to have the corresponding throughput measurements. WDYT?

facebookresearch / fairscale

[FSDP][benchmarks] Add regression benchmarks for the FSDP API. #677

🐛 Bug