This PR introduces the broadcast_module() helper function and updates the wav2vec2 ASR evaluation recipe to use it. This significantly reduces the pressure on disk I/O for large models and instead broadcasts the module state over the network fabric. DDP and FSDP already offer a similar feature, but this is standalone and can be used with evaluation and inference jobs as well.
As of today, we rely on the private torch.distributed._broadcast_coalesced function although its c10d counterpart is a public API. Once the P0 items are delivered today, I will expose c10d::broadcast_coelesced in fairseq2n and remove the private API use.
This PR introduces the
broadcast_module()
helper function and updates the wav2vec2 ASR evaluation recipe to use it. This significantly reduces the pressure on disk I/O for large models and instead broadcasts the module state over the network fabric. DDP and FSDP already offer a similar feature, but this is standalone and can be used with evaluation and inference jobs as well.As of today, we rely on the private
torch.distributed._broadcast_coalesced
function although its c10d counterpart is a public API. Once the P0 items are delivered today, I will exposec10d::broadcast_coelesced
in fairseq2n and remove the private API use.