denera commented 3 weeks ago

Description

This PR splits off the userbuffers MPI-dependence removal from PR #760.

With these changes, userbuffers is now bootstrapped via callbacks to torch.distributed collectives. In the absence of the MPI dependence, userbuffers is always compiled as part of the PyTorch extension and no longer requires the NVTE_WITH_USERBUFFERS=1 flag.

The old MPI-based bootstrapping can be re-activated via UB_MPI_BOOTSTRAP=1 at compile time.

Type of change

[ ] Documentation change (change only to the documentation, either a fix or a new content)
[ ] Bug fix (non-breaking change which fixes an issue)
[x] New feature (non-breaking change which adds functionality)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

Changes

TE build no longer supports the NVTE_WITH_USERBUFFERS=1 option and Userbuffers is now always compiled into the PyTorch extensions module.
MPI collectives in Userbuffers bootstrapping are replaced with callbacks to torch.distributed collectives.
New UB_MPI_BOOTSTRAP=1 option in the TE build activates the old Userbuffers bootstrapping via MPI collectives.
transformer_engine.pytorch.module.base.initialize_ub(...) is now more conveniently accessible via transformer_engine.pytorch.initialize_ub(...).
Userbuffer communicators can now be cleaned up via transformer_engine.pytorch.destroy_ub(...).
transformer_engine.pytorch.initialize_ub(...) now requires the tensor-parallel process group instead of just the tensor-parallel size.
Added comm+GEMM overlap example with te.LayerNormMLP.

Checklist:

[x] I have read and followed the contributing guidelines
[x] The functionality is complete
[x] I have commented my code, particularly in hard-to-understand areas
[x] I have made corresponding changes to the documentation
[x] My changes generate no new warnings
[x] I have added tests that prove my fix is effective or that my feature works
[x] New and existing unit tests pass locally with my changes

timmoon10 commented 3 weeks ago

/te-ci pytorch