DARMA-tasking / vt

DARMA/vt => Virtual Transport
Other
35 stars 9 forks source link

Semantics of collection chain set wrt nextStepCollective #410

Open lifflander opened 5 years ago

lifflander commented 5 years ago

Per a discussion with @PhilMiller, we've concluded that a mechanism that actually holds back messages for a given collective step (from one chain to another) could provide a clearer semantic.

First, note that a message marked with a collective epoch could arrive before that epoch is even created on another node. This is because theTerm()->makeCollectiveEpoch(..) is not synchronized by design. This definitely complicates any mechanism that we build.

My current proposition:

  1. Add a bit to epoch that indicates if the epoch is ready to execute immediately or must be made ready. Currently, all are ready immediately.

  2. If that bit is set as unready, the virtual context collection manager will not deliver the message until the user indicates it ready.

  3. Add a call proxy(idx).ready(epoch) that allows delivery

  4. Add the proxy to collection chain set so it can call this, for each index at the right time, and set the epoch bit appropriately.

PhilMiller commented 5 years ago

This sounds at least superficially reasonable. I'll think about it more tomorrow.

lifflander commented 5 years ago

Any more comments? I’m thinking about implementing this soon

PhilMiller commented 5 years ago

I realize there's a tension/inconsistency between different points of your proposal. As in your point 3, I believe that readiness to handle messages in an epoch needs to be on a per-object. I might possibly extend that to a per-handler basis. That conflicts with the notion in points 1-2 that this is a per-epoch notion, seemingly across all objects.

More broadly, this in some ways resembles notions in more dogmatic Actor programming models of objects having a 'mailbox' from which they select inbound messages. We're definitely distinguishing our approach in that control of mailbox-processing-order is living at least partially outside the recipient, but we should keep that in mind.

lifflander commented 5 years ago

The “dependent” bit can never be unset for a given epoch. It just puts the epoch in the category so it can be treated accordingly by the runtime.

lifflander commented 5 years ago

The lack of readiness is handled by each component in VT. For collections, each element can track its readiness for a given epoch individually.

PhilMiller commented 5 years ago

Had a thought earlier today, which may or may not be a productive direction:

The chain manager can arrange things so that none of the messages in a chain actually get delivered until it's got the whole chain defined (by not calling finishedEpoch on the dummy initial epoch). That means that it can send a control message describing the whole sequence of upcoming epochs to each object, before it would start executing any of them. Perhaps this makes the problem easier to solve, but with the proviso that the generation of the sequence could be constrained by how we might implement this.

PhilMiller commented 4 years ago

I think at this point, we've seen that the current design works well enough, and I don't think we're going to make any sort of big change in semantics before we call what we have "1.0". So, I'm going to untag this for now, and we can revisit it in some planning session later.

lifflander commented 3 years ago

Archiving this for now, it's not clear that this is needed right now.