issues
search
aws
/
aws-ofi-nccl
This is a plugin which lets EC2 developers use libfabric as network provider while running NCCL applications.
Apache License 2.0
129
stars
51
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
fix -Wc++-compat errors
#464
aws-nslick
opened
4 days ago
10
[v1.9.x-aws] update versions in codechecker workflow
#463
AmedeoSapio
closed
1 day ago
3
.github: update versions in codechecker workflow (Attempt #2)
#462
rauteric
closed
4 days ago
1
Add ROCm as alternative to CUDA for plugin use
#461
ryanhankins
opened
5 days ago
1
Log prov_errno number in case of completion errors
#460
AmedeoSapio
closed
4 days ago
1
.github: update versions in codechecker workflow
#459
rauteric
closed
6 days ago
1
Add platform hook to sort rails and EFA implementation
#458
rauteric
closed
5 days ago
0
Use the zero-copy path in the EFA provider for the RDMA protocol
#457
AmedeoSapio
closed
1 week ago
2
pthread: add missing errno include
#456
aws-nslick
closed
1 week ago
0
prefer spinlocks where possible
#455
aws-nslick
opened
1 week ago
1
chore: gitignore update
#454
aws-nslick
closed
1 week ago
0
configure: remove builtin check
#453
aws-nslick
closed
1 week ago
2
tree: set export symbol visibility
#452
aws-nslick
closed
1 week ago
0
Remove unused reference to OFI_NCCL_ARCHIVE in configure.ac
#451
ryanhankins
closed
1 week ago
2
Remove unused reference to OFI_NCCL_ARCHIVE in configure.ac
#450
ryanhankins
closed
2 weeks ago
0
Remove unused reference to OFI_NCCL_ARCHIVE in configure.ac
#449
ryanhankins
closed
2 weeks ago
0
1.9.x: Bump version to 1.9.3a1-aws
#448
AmedeoSapio
closed
2 weeks ago
1
Introduce a memory registration cache for the net plugin
#447
rajachan
opened
2 weeks ago
3
rdma: add dev_id to req completion LTTNG trace point
#446
taeilum00
closed
2 weeks ago
2
Update version number and changelog for v1.9.2-aws release.
#445
AmedeoSapio
closed
2 weeks ago
3
[v1.9.x-aws] regIsGlobal=0 and related fixes
#444
rauteric
closed
2 weeks ago
2
[v1.9.x-aws].ci/aws: Add aws-ofi-nccl functional tests to ci
#443
a-szegel
closed
3 weeks ago
2
Three patches in support of upcoming release
#442
rauteric
closed
3 weeks ago
5
One more cherry-pick for 1.9.2-aws
#441
AmedeoSapio
closed
3 weeks ago
0
More cherry-picks for 1.9.2-aws
#440
AmedeoSapio
closed
3 weeks ago
0
.ci/aws: Add aws-ofi-nccl functional tests to ci
#439
a-szegel
closed
3 weeks ago
3
Separate endpoints for recv communicators from same source endpoint
#438
rauteric
opened
3 weeks ago
12
Shrink control message to 32 bytes
#437
bwbarrett
closed
1 week ago
3
rdma: support recv size < send size
#436
AmedeoSapio
closed
3 weeks ago
0
Class interface cleanups for plugin and device structures
#435
bwbarrett
closed
3 weeks ago
0
[v1.9.x-aws].ci/aws: Move p3dn's to ap-northeast-1
#434
a-szegel
closed
3 weeks ago
1
.ci/aws: Move p3dn's to ap-northeast-1
#433
a-szegel
closed
3 weeks ago
0
Cleanup nccl_ofi.h / utility macros.
#432
bwbarrett
closed
3 weeks ago
2
Mutex behavior cleanup / improvements
#431
bwbarrett
closed
4 weeks ago
0
<Do Not Merge> Test to see if branch can be merged to master that is behind
#430
a-szegel
closed
3 weeks ago
0
fix: check if DevComm is valid before setting it
#429
AmedeoSapio
closed
4 weeks ago
0
TEST TEST
#428
a-szegel
closed
1 month ago
1
.ci/aws: Add 5 min sleep before launching p3dn
#427
a-szegel
closed
1 month ago
1
[v1.9.x-aws] .ci/aws: Attempt to add stable p3dn testing into CI
#426
a-szegel
closed
1 month ago
1
Cherry-picks for 1.9.2-aws
#425
AmedeoSapio
closed
1 month ago
0
tuner: prefer NVLSTREE on 16 nodes at 4GB
#424
AmedeoSapio
closed
1 month ago
0
.ci/aws: Add stable p3dn testing into CI
#423
a-szegel
closed
1 month ago
6
tuner: better ring rank/msize binning
#422
aws-nslick
closed
4 days ago
6
[v1.9.x-aws].ci/aws: Remove unstable p3dn tests from Jenkins
#421
a-szegel
closed
1 month ago
1
more gh actions improvements
#420
aws-nslick
closed
1 month ago
0
test: Have GitHub workflows build functional tests
#419
rajachan
closed
1 day ago
5
[v1.9.x-aws] Backport Jenkins CI Stability Fixes
#418
a-szegel
closed
1 month ago
0
.ci/aws: Jenkins CI Stability Fixes
#417
a-szegel
closed
1 month ago
6
[v1.9.x-aws].ci/aws: Move p5 ODCR to af-south-1
#416
a-szegel
closed
1 month ago
1
.ci/aws: Move p5 ODCR to af-south-1
#415
a-szegel
closed
1 month ago
1
Next