issues
search
cisco-open
/
pymultiworld
A framework for PyTorch to enable fault management for collective communication libraries (CCL) such as NCCL
Apache License 2.0
16
stars
4
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
chore(deps): bump step-security/harden-runner from 2.9.0 to 2.10.1
#92
dependabot[bot]
opened
1 month ago
0
misc: version 0.2.1
#91
myungjin
closed
2 months ago
0
docs: revise description on installation
#90
myungjin
closed
2 months ago
0
fix: error handle in initial configuration
#89
myungjin
closed
2 months ago
0
Is FSDP/DDP supported ?
#88
samsja
opened
2 months ago
2
refactor: run post_setup script on init
#87
raresgaia123
closed
2 months ago
2
docs: updated readme file
#86
raresgaia123
closed
2 months ago
0
docs: updated API docs
#85
raresgaia123
closed
2 months ago
0
misc: version 0.2.0
#84
myungjin
closed
2 months ago
0
refactor: restructuring packaging
#83
myungjin
closed
2 months ago
0
fix: added mock import
#82
raresgaia123
closed
2 months ago
0
fix: gloo backend
#81
myungjin
closed
2 months ago
0
fix: build fix for readthedocs
#80
raresgaia123
closed
2 months ago
0
fix: fix requirements for Sphinx build
#79
raresgaia123
closed
2 months ago
0
fix: fix readthedocs build
#78
raresgaia123
closed
2 months ago
0
docs: introduction doc
#77
raresgaia123
closed
2 months ago
0
chore: added required files for RTD
#76
raresgaia123
closed
2 months ago
1
docs: getting started section
#75
raresgaia123
closed
2 months ago
0
docs: installation doc
#74
raresgaia123
closed
2 months ago
0
fix: fixed tensors
#73
raresgaia123
closed
2 months ago
0
docs: updated rst files with updated README
#72
raresgaia123
closed
2 months ago
0
chore(deps): bump step-security/harden-runner from 2.9.0 to 2.9.1
#71
dependabot[bot]
closed
1 month ago
1
docs: added docs using sphinx
#70
raresgaia123
closed
3 months ago
0
misc: minor update on docstring
#69
myungjin
closed
3 months ago
0
docs: improved documentation for examples
#68
raresgaia123
closed
3 months ago
0
refactor: refactor resnet example
#67
raresgaia123
closed
3 months ago
0
feat: boolean function to check if world is broken
#66
myungjin
closed
3 months ago
0
refactor: improved docstring on methods
#65
raresgaia123
closed
3 months ago
0
refactor: examples error handling
#64
raresgaia123
closed
3 months ago
1
docs: resnet documentation
#63
raresgaia123
closed
3 months ago
0
docs: added send_recv docs
#62
raresgaia123
closed
3 months ago
0
feat: added scatter example
#61
raresgaia123
closed
3 months ago
0
misc: version to 0.1.2
#60
myungjin
closed
3 months ago
0
refactor: nccl's async error handling in pytorch
#59
myungjin
closed
3 months ago
0
misc: version 0.1.1
#58
myungjin
closed
3 months ago
0
doc: readme revision
#57
myungjin
closed
3 months ago
0
doc: added doc for reduce example
#56
raresgaia123
closed
3 months ago
0
doc: added gather doc
#55
raresgaia123
closed
3 months ago
0
feat: added gather example
#54
raresgaia123
closed
3 months ago
0
misc: version bump-up to 0.1.0
#53
myungjin
closed
3 months ago
0
fix: asyncio-friendly nccl operations
#52
myungjin
closed
3 months ago
0
feat: pytorch v2.4.0 patch
#51
myungjin
closed
3 months ago
0
feat: added reduce example
#50
raresgaia123
closed
3 months ago
0
doc: added doc for broadcast
#49
raresgaia123
closed
3 months ago
0
feat: update broadcast examples
#48
raresgaia123
closed
3 months ago
0
feat: broadcast multiple worlds
#47
raresgaia123
closed
3 months ago
0
feat: updating examples readme file
#46
raresgaia123
closed
3 months ago
0
feat: added broadcast example
#45
raresgaia123
closed
3 months ago
0
feat: added all_gather docs
#44
raresgaia123
closed
3 months ago
0
feat: added all_gather example for multiple worlds
#43
raresgaia123
closed
3 months ago
0
Next