facebookresearch / fairscale

PyTorch extensions for high performance and large scale training.
Other
3.17k stars 279 forks source link

[FSDP][benchmarks] Add regression benchmarks for the FSDP API. #677

Open anj-s opened 3 years ago

anj-s commented 3 years ago

🐛 Bug

We need to add regression benchmarks for the FSDP API and possible input combinations. These regression benchmarks should be added to fairscale/benchmarks.

anj-s commented 3 years ago

@myleott @sshleifer Would either of you have cycles to add regression benchmarks for FSDP? We maintain regression benchmarks for all FairScale APIs that are run at every commit.

cc @min-xu-ai

sshleifer commented 3 years ago

How long does it take? We are very low on cycles but this sounds incredibly cool.

anj-s commented 3 years ago

It should not take very long. We have an example for OSS and SDP here. Let me know if this sounds feasible. Thanks!

min-xu-ai commented 3 years ago

Hi @tmarkstrum, this is probably a good bug for you to have while you work on benchmarking?

FYI, @anj-s, Tingting and I met this morning and we think the benchmarking of FSDP can be structured in at least the following dimensions:

Let Tingting and me know if we have missed anything please. Thanks a lot!

anj-s commented 3 years ago

Thanks @min-xu-ai for the detailed breakdown! this looks like a great start.

Will we be testing cpu_offload as one of the features? It will probably enable the maximum memory savings and would be good to have the corresponding throughput measurements. WDYT?