cisco-open / pymultiworld

A framework for PyTorch to enable fault management for collective communication libraries (CCL) such as NCCL
Apache License 2.0
15 stars 4 forks source link

feat: added all_gather docs #44

Closed raresgaia123 closed 2 months ago

raresgaia123 commented 2 months ago

added explanation on how All Gather example can be run across single and multiple worlds.

Description

Please provide a meaningful description of what this change will do, or is for. Bonus points for including links to related issues, other PRs, or technical references.

Note that by not including a description, you are asking reviewers to do extra work to understand the context of this change, which may lead to your PR taking much longer to review, or result in it not being reviewed at all.

Type of Change

Checklist