cisco-open / pymultiworld

A framework for PyTorch to enable fault management for collective communication libraries (CCL) such as NCCL
Apache License 2.0
15 stars 4 forks source link

feat: support for more ccl operations #22

Closed myungjin closed 3 months ago

myungjin commented 3 months ago

Description

broadcast, all-reduce, reduce, all-gather, gather and scatter are now implemented.

Also, send and recv code is simplified.

Type of Change

Checklist