celerity / celerity-runtime

High-level C++ for Accelerator Clusters
https://celerity.github.io
MIT License
139 stars 18 forks source link

Re-define transfer_id as struct{task_id, buffer_id, reduction_id} #230

Closed fknorr closed 7 months ago

fknorr commented 9 months ago

This is another back-port from IDAG development.

The transfer_id is used to match incoming MPI transfers to await-push commands. Previously this was an integer encoding of "task + chunk id", which together with a buffer_id, and (optionally) a reduction_id uniquely identifies an await-push on the target node.

It turns out that instead of the "magic" chunk id, the task id is sufficient. This PR bundles it with the buffer id and reduction id such that struct transfer_id now uniquely identifies the receiving await-push, improving readability of logs / dot graphs and avoiding repetitions of (transfer_id, buffer_id, reduction_id) in the code.

github-actions[bot] commented 9 months ago

Check-perf-impact results: (4c65f1399a47e0eb1340f63004745b17)

:question: No new benchmark data submitted. :question:
Please re-run the microbenchmarks and include the results if your commit could potentially affect performance.

fknorr commented 9 months ago

Bikeshedding, from an in-person discussion with @psalz: transfer_id might not be a good name because it does not uniquely identify a transfer: There can be multiple transfers per transfer_id on each node (for different subregions) and the same transfer_id + region can exist multiple times in the cluster if the same data is sent to more than one node.

How about buffer_version or buffer_version_id (see source comments)?

coveralls commented 7 months ago

Pull Request Test Coverage Report for Build 7775009065


Totals Coverage Status
Change from base Build 7727056751: 0.02%
Covered Lines: 4997
Relevant Lines: 5195

💛 - Coveralls
psalz commented 7 months ago

How about buffer_version or buffer_version_id (see source comments)?

I feel like a "buffer version" would have to contain the last writer task id, not the consumer task id. Otherwise two reading tasks that require the same data written by an earlier task would receive two different "versions", even though it's the same data, which seems weird to me.

fknorr commented 7 months ago

From in-person discussion: We're leaving the name as-is for the moment to avoid rebase pain in the IDAG branch, and in the hope that documentation will make up for the imperfect name.