aws / aws-ofi-nccl

This is a plugin which lets EC2 developers use libfabric as network provider while running NCCL applications.
Apache License 2.0
147 stars 56 forks source link

rdma: add separate bounce buffer freelist for data (eager) messages #614

Open rauteric opened 2 months ago

rauteric commented 2 months ago

Separate out bounce buffer freelists into a smaller-sized freelist for control messages and a larger size for data (eager) messages

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

rauteric commented 1 month ago

The CI failure(s) are real:

scatter_perf: nccl_ofi_rdma.c:2100: alloc_bounce_req: Assertion `NCCL_OFI_IS_PTR_ALIGNED(&bounce_fl_item->bounce_msg, BOUNCE_BUFFER_ALIGNMENT)' failed.

Need to fix up some asserts.

rauteric commented 1 month ago

Force-push to 3494556 fixes the CI failure by having alignment on both (eager) bounce buffers and ctrl recv buffers.

This requirement will be relaxed when we refactor the data structure in the related #600.

rauteric commented 1 month ago

Rebased on master

AmedeoSapio commented 1 month ago

bot:aws:retest

AvivBenchorin commented 1 month ago

bot:aws:retest

rauteric commented 1 month ago

Rebase on master

rauteric commented 1 month ago

Rebased on master

rauteric commented 1 month ago

Bounce buffers for eager messages and control messages are different things. Have different structures with different names so that there's no reason to figure out what's what.

They are different things, but there is a lot of code that is shared between them (e.g., all of the buffer pool management code, posting receives). We should rename these structures to something more generic (like receive buffers instead of bounce buffers [which I wanted to do in a followup patch]), but I think duplicating all the buffer management code is overkill.

I'm also not sure I understand the desire to move the freelists from the endpoint into the per-rail structures, given that most of the data is really operations that will be per-endpoint.

I'm confused... isn't having the freelists per-rail exactly what you were advocating in https://github.com/aws/aws-ofi-nccl/pull/614#discussion_r1767810883?

I think it makes sense to have the freelists per-rail because posted recv buffer counts are maintained per-rail.

bwbarrett commented 1 month ago

I think the problem with your patch is that it's way too much cognitive load to figrue out what a bounce buffer is. And the rest is just too confusing to review. Fix that.