Mage IPC library

Test cases/scenarios, each with a sublist of tests that most minimally exercise the scenario:

[x] End-to-end multiple process invitation acceptance and receipt
- MageTest.ParentIsInviterAndReceiver
- MageTest.ParentIsAcceptorAndReceiver
- MageTest.ParentIsAcceptorAndReceiverButChildBlocksOnAcceptance
[x] Inviter immediately binds its MessagePipe to a mage::Receiver. Messages come in once the acceptor accepts and binds its MessagePipe to a mage::Remote and sends messages
- MageTest.ParentIsInviterAndReceiver
[x] Inviter immediately binds its MessagePipe to a mage::Remote and queues messages. They deliver once the acceptor accepts and binds its MessagePipe to a mage::Receiver
- MageTest.ParentIsAcceptorAndReceiver
[x] Inviter waits until acceptor accepts the invitation before binding its MessagePipe to a mage::Remote and sending messages. They deliver once the acceptor accepts and binds its MessagePipe to a mage::Receiver
- This is a subtle variant of the test above, just to verify that there is no difference between the inviter using the MessagePipe as a mage::Remote (a) synchronously after sending the invitation vs (b) asynchronously after the invitation acceptance has made it back to the inviter
- MageTest.ParentIsAcceptorAndReceiverButChildBlocksOnAcceptance
[x] Binding a mage::Receiver that has multiple queued messages does not deliver all messages synchronously
- MageTest.InProcessQueuedMessagesAfterReceiverBound
- MageTest.InProcessQueuedMessagesBeforeReceiverBound
[x] General sending an endpoint to another process to establish more pipes between two processes
- MageTest.SendHandle*
[x] A->B->A. Sending an endpoint-baring message to a remote endpoint in the proxying state (proxy back to A). The message gets forwarded correctly, and the endpoint that gets recovered from the forwarded message is usable. This shows that the forward-message-from-proxy mechanism correctly creates endpoints out of EndpointDescriptors in the node that has the proxying endpoint, but also puts those endpoints in the proxying state. Fixed in https://github.com/domfarolino/browser/pull/32/commits/88b4c9100434a55b03c12780e7a90f8ae61f6619
- MageTest.ChildPassRemoteAndReceiverToParentToSendEndpointBaringMessageOver
[x] A simple example showing that ordering is not guaranteed between two message pipes
- MageTest.OrderingNotPreservedBetweenPipes_SendBeforeReceiverBound
- MageTest.OrderingNotPreservedBetweenPipes_SendAfterReceiverBound
[x] Another test to show ordering differences between two pipes: just binding two mage::Receivers in the opposite order that messages were sent on them
- MageTest.OrderingNotPreservedBetweenPipes_Simple
[x] A->B->A->B(->A). Sending an endpoint-baring message to a remote endpoint in the proxying state back and forth between two processes. It gets sent back to the node that sent the first endpoint-baring message in the first place, and does this a few times so that the endpoint is getting a new name each time it get received by the same two processes, creating a repetitive proxy loop. This shows that when we forward a message from a proxying endpoint, we not only set the dependant endpoints in the proxying state, but we also update the EndpointDescriptors on the message so that the node receiving the forward message creates an endpoint with a new name, not the same name. This is currently broken.
- MageTest.PassEndpointBearingHandleBackAndForthBetweenProcesses
[x] A->B->C. Process A creates two endpoints, and sends one to process B. B sends it to process C. A uses its endpoint as a mage::Remote, and C uses its as a mage::Receiver. Messages flow from A->B->C successfully. This shows that message forwarding from proxying endpoints works fine
- Not adding a test for this since I don't want to codify the case where a child process (B) spins up another process (C), since that goes against the "master node" design that we should be going towards. See https://github.com/domfarolino/browser/blob/domfarolino/basic-mage/mage/docs/design_limitations.md#no-ability-to-send-native-sockets-across-processes. We could add a test where B & C are both children of A; then A passes a C-pointing remote to B, and then sends messages through B that end up in C, but that's not very different from other tests that we have.
[x] mage::Remote/mage::Receiver on the same message pipe but for different interfaces: Not supported behavior, should just fail
- MageTest.RemoteAndReceiverDifferentInterfaces
[x] Send an endpoint that is bound (should fail). Only fails for receivers since we can't tell if Endpoint is bound to a remote
- MageTest.SendBoundReceiverUnitTest
[x] mage::Remote can be bound and unbound, and messages are all delivered and well-ordered. Unbinding is exercised in the following tests:
- MageTest.ChildPassSendInvitationPipeBackToParent
- MageTest.ChildPassAcceptInvitationPipeBackToParent
[x] Passing an EndpointDescriptor over an endpoint whose peer is local and bound
- MageTest.SendHandleToBoundEndpoint
[x] Passing an EndpointDescriptor over an endpoint whose peer is local but proxying remote
- This is already covered by a ton of tests. I'll refrain from enumerating
[x] Send a message with at least one MessagePipe/EndpointDescriptor to a remote Endpoint that is not yet bound. The message should be queued and a concrete backing Endpoint should be created from the descriptor. Asynchronously later when a mage::Receiver is bound to the receiving endpoint, the message (with the MessagePipe) is replayed to the receiver delegate and the sent-Endpoint should not be re-created from the descriptor, that would be a leak
- Not sure exactly where this is tested, but the implementation handles this correctly. The only time endpoints are recovered "fresh" is in Node::OnReceivedUserMessage() which is called on the IO thread right when a message is received

Task list

[x] Rename mage::MessagePipe to mage::MessagePipe. Fixed in https://github.com/domfarolino/browser/pull/32/commits/11a895a2faee226fb77bdf99687665775c73134d
[x] Rename magen Bazel rule to magen_idl. Fixed in https://github.com/domfarolino/browser/pull/32/commits/6449f4b2c931bb6d95bbd00af9216769ba58f779
[x] Add outputs to magen_idl Bazel rule and make the macro include the cc_library(). Fixed in https://github.com/domfarolino/browser/pull/32/commits/6449f4b2c931bb6d95bbd00af9216769ba58f779
[x] Add logging verbosity to mage::Core::Init(). Fixed in https://github.com/domfarolino/browser/pull/32/commits/7a6445e7bb1c4e7a9afdb6b04e3e6a775a969afe
[x] Allow comments in .magen files. Fixed in https://github.com/domfarolino/browser/pull/32/commits/bf696aa272645f5245cebfafef7c5d55c5a970ad and https://github.com/domfarolino/browser/pull/32/commits/545d7a5a4ebaa0ad3afa150da8c40300844f83aa
[x] Figure out the lifetimes of Receiver, InterfaceStub, Endpoint, and specifically fix the case where Endpoint posts a task (to receive a message) to a ReceiverDelegate (aka InterfaceStub) but the underlying implementation is destroyed by the time the stub wishes to dispatch to it. Fixed in https://github.com/domfarolino/browser/pull/32/commits/e07c997fd64ec039842ff84da0d37498496c8557
[x] Enable mage::Remote unbinding. Fixed in https://github.com/domfarolino/browser/pull/32/commits/7c031b810d02bfb6116699feebc3a6d99473000e and https://github.com/domfarolino/browser/pull/32/commits/e9d398ea9e4877034be9de3431f21836d16d3ea9
[x] Disallow sending endpoints that are bound. Fixed in https://github.com/domfarolino/browser/pull/32/commits/c209096268219a41d16259d61d4bc9d071d6c187, only for receiver-bound endpoints
[x] Synchronize access of thread-unsafe Endpoint members: state, incoming_messages_, delegate_, and delegate_task_runner_ .
- [x] It's possible for AcceptMessage() to queue a message to the incoming message queue at the same time RegisterDelegate() is called. This results in messages being queued but never dispatched. Fixed in https://github.com/domfarolino/browser/pull/32/commits/f39d8e92009cff824f453cad0b758c6b593ba114
- [x] It's possible for SetProxying() to be called on the IO thread on an endpoint while a message is being sent to that endpoint on the UI thread. This is the cause of the race condition we're seeing in MageTest.ChildPassSendInvitationPipeBackToParent. Fixed in https://github.com/domfarolino/browser/commit/0218a356025292a24909a1b618dfd2d8c83503e0, https://github.com/domfarolino/browser/commit/c58aaea7680fb025ee3a6f353902f9c2018fe571, and https://github.com/domfarolino/browser/commit/09b699d12b12ad52f44b94c6a0dd7238d87d690e
[x] When forwarding an EndpointDescriptor on a message, we should generate a new cross_node_endpoint_name (and have that modify the message: this implies making Message::GetEndpointDescriptors() return something mutable) which will fix the A->B->A->B(->A) case described in the above list. Fixed in https://github.com/domfarolino/browser/pull/32/commits/ce0f3dad2f4902db128d44e390b701f8922f653b, https://github.com/domfarolino/browser/pull/32/commits/fe282ee1b06eb273b21ae7694ccf998e57fb9c2b
[x] When forwarding an EndpointDescriptor, maybe we should set the endpoint descriptor's peer address to the endpoint forwarding the EndpointDescriptor. This is related to the C->B->A test scenario described in the above list, that currently breaks
- Actually we should not do this. Instead, we should just support sending native sockets in messages. See https://github.com/domfarolino/browser/blob/domfarolino/basic-mage/mage/docs/design_limitations.md#no-ability-to-send-native-sockets-across-processes
[x] Stop calling Endpoint::SetProxying() from inside Core::PopulateEndpointDescriptorAndMaybeSetEndpointInProxyinState(). This is a hack; we should be calling SetProxying(), but we should be doing so from within a lock, and we should also be flushing all queued messages form the newly-proxying endpoint. Essentially, we need to merge this path with Node::SendMessagesAndRecursiveDependents(). Fixed in https://github.com/domfarolino/browser/pull/32/commits/3209f7d3759a9a86764855995288d36ab264a10c
[x] Document history of https://github.com/darinf/ports and https://github.com/darinf/ipc_idl in README.md
[x] Document message invitations in README.md, which establish initial message pipes
[x] Document limitations in README.md

Followup:

Support mage::Receiver unbinding
- This is hard because it would allow message pipes and Endpoints to truly be bidirectional, which doesn't play well with proxies as-is. Right now proxies are set-up in a one-way direction, pointing to the next Endpoint to receive a message. But to support receiver unbinding (and rebinding to a remote, for example), we'd have to wire up proxies in both directions, so that an Endpoint knows which direction in the proxy chain to send a message
Consider making mage::MessagePipe move-only
Write test: C->B->A. Same as above except A uses its endpoint as a mage::Receiver and C uses its as a mage::Remote. A and C do not have a direct IPC connection, but if we use the endpoint in Process C as a "remote", its peer node will be A and it won't be able to find a channel to send messages to and it will blow up. Probably when B forwards the EndpointDescriptor to C, it should set the descriptor's peer address to the endpoint forwarding the descriptor (in B), not that endpoint's peer (A), which may be foreign to the ultimate recipient. See https://github.com/domfarolino/browser/blob/domfarolino/basic-mage/mage/docs/design_limitations.md#no-ability-to-send-native-sockets-across-processes. This will not work until we support sending native sockets in messages. Until we do that, C will not know how to address messages to A. We could work around it but the design notes tell us why that's a bad/ugly idea
Figure out Channel start up and shut down. We should probably only register a channel as a socket reader with the IO TaskLoop on the IO thread (maybe we should even enforce this inside TaskLoopForIO?) This used to cause a race condition, because Channel calls WatchSocket() from the UI thread, and the TaskLoop impl registered the socket with the kernel before adding the watcher async_socket_readers_, which means the IO thread can start getting notified (from the kernel) of available reads on a file descriptor that it doesn't yet have a registered watcher for. https://github.com/domfarolino/browser/pull/61 has since fixed this, but we should probably make the Channel stuff in Mage a little more sane eventually anyways.
Disallow sending an interface over itself. This would be fixed by construction if we disallowed sending bound pipes, but we can't enforce that for remotes right now since Remote isn't directly connected to an underlying Endpoint, so when we send a bound remote pipe, the Endpoint representing the pipe doesn't know anything about a Remote it might be bound to. So for now, this is technically allowed. This will be fixed by construction when we make handles move-only. Once that's done, we still have to contend with remote->SendMessage(remote->Unbind()). Right now this works, but calling Unbind() should actually invalidate the internal Remote state and cause this to crash.

Test Health

[x] CoreInitStateUnitTest
[x] UseUnboundRemoteCrashes
[x] UseUnboundRemoteCrashes2
[x] SendBoundRemoteTechnicallyAllowedUnitTest
[x] SendBoundReceiverUnitTest
[x] RemoteAndReceiverDifferentInterfaces
[x] InitializeAndEntangleEndpointsUnitTest
[x] SendInvitationUnitTest
[x] AcceptInvitationUnitTest
[x] ParentIsInviterAndReceiver
[x] ParentIsAcceptorAndReceiver
[x] ParentIsAcceptorAndReceiverButChildBlocksOnAcceptance
[x] SendHandleAndQueuedMessageOverInitialPipe_01
[x] SendHandleAndQueuedMessageOverInitialPipe_02
[x] SendHandleAndQueuedMessageOverInitialPipe_05
[x] SendHandleAndQueuedMessageOverArbitraryPipe
[x] SendInvitationAndReceiveQueuedEndpointsFromAcceptor
[x] ReceiveHandleFromRemoteNode
[x] ChildPassAcceptInvitationPipeBackToParent
[x] ChildPassSendInvitationPipeBackToParent
[x] ChildPassRemoteAndReceiverToParent
[x] ChildPassRemoteAndReceiverToParentToSendEndpointBaringMessageOver
[x] PassHandleBackAndForthBetweenProcesses
[x] PassEndpointBearingHandleBackAndForthBetweenProcesses
[x] InProcessQueuedMessagesAfterReceiverBound
[x] InProcessQueuedMessagesBeforeReceiverBound
[x] OrderingNotPreservedBetweenPipes_SendBeforeReceiverBound
[x] OrderingNotPreservedBetweenPipes_SendAfterReceiverBound
[x] OrderingNotPreservedBetweenPipes_Simple
[x] InProcessCrossThread
[x] SendMessageToDeletedReceiver
[x] SendHandleToBoundEndpoint

Obsolete: Super random notes and thoughts: We have this idea of a mage::Endpoint being in a "proxying" state where its messages are queued internally, and then a caller can "Take()" them and forward them to the node that it is proxying to. I had the idea of putting a mage::Endpoint into a "proxying" state where it would take a remote node name, and modify its peer address to point to that remote name and then go through the normal sending route. However this might be tricky when we pass a mage::Endpoint in a mage message that is bound for the same process. We need to think about this case.

Interesting cases we need to think about:

Creating two local pipes and handing them off to separate processes
Creating two local pipes and handing one of them off to a separate process
Creating two local pipes and handing one of them off to a same-process peer endpoint

(1) above is a generalized version of (2), so let's focus on (2). We have two options to do this:

Do this similar to invitation sending:
- Keep the local soon-to-be-remote Endpoint in local_endpoints_, and let it queue messages. Finally we receive an acceptance message from the remote node, where we do a full proxying over TODO: Finish this thought

Design for sending endpoints to other nodes

This section might be a bit obsolete, but much of it probably holds.

What follows is the design for sending endpoints to another node (process). Imagine you have endpoints A and B in node N1, and we send B to another node N2. We have two options:

Delete N1.B, and update N1.A's peer address from N1.B to N2.B (this option is actually slightly more complicated than how I just presented it)
Keep N1.B around, but set it into a proxying state so that it will forward messages to "the real B", N2.B

I think we need to go with (2) for now, here's why. Eventually mage (like mojo) should be able to pass two endpoints each to different processes, and have the originating process delete both original endpoints (option (1) above). That is, the two endpoints that live in separate processes can talk directly to each other without mediation from the process they originated in. This is pretty complex for two reasons:

It requires sending actual file descriptors to other processes via something like CMSG headers
It requires some pretty complicated messaging that mojo (not mage!) currently supports, where for a period of time after the originator sends the two endpoints to the two separate processes, it maintains "proxies" representing these endpoints that survive (and continue to forward any lingering messages) until the real endpoints are set-up in the other process and the proxies have no more messages to forward. That means the originator has to tell any other processes that might still sending messages to it, to stop and instead send its messages to the "real" destination process that we sent the endpoint to. Once the originating process receives an ACK from that the node (if any exists) that was still sending messages to the originator, the originator can be sure it will receive no more messages for the proxy, and it can delete the proxy.

We'll support this one day for performance/efficiency, but the consequences of this shouldn't be observable, and it is pretty complicated to implement so we're avoiding it for the initial version. In that case, when the originator process sends two endpoints two different processes so that they can talk directly to each other, each message is actually going through the originator process's "permanent" proxies.

This requires solution (2) above, where for each endpoint node N1 sends to node N2, node N1 keeps a permanent proxying endpoint around pointing to N2, where it believes the "real" endpoint was last seen. In reality, N2 might immediately send the endpoint to N3. But that's fine—when it receives messages from N1, they'll arrive at N2's proxy, which will point to N3, and the forwarding will happen automatically. Even once we support direct message communication, this sort of daisy-chain proxying will be necessary for a brief period of time until all of the ACKs in the chain settle.

At first I considered a design where each endpoint always had a local peer, and that peer would just be in a proxying state if it represented a concrete endpoint in another process. This was because I got confused and thought that endpoints-traveling-in-pairs was a requirement for design (2). This would imply that endpoints implicitly travel together:

Nodes that send an invitation would be given one handle but two endpoints registered with Core. One endpoint would concretely represent the handle, and the other (with no handle) would be its "local peer" that is automatically in a proxying state.
We'd have the same situation for nodes that accept invitations
We'd have a similar situation for nodes that create a pair of entangled message pipes and send one end to another node. The sending node would have two endpoints, one in a proxying state pointing to its concrete look-alike in the remote node. And the receiving node would receive a single handle, but create two endpoints under the hood: one that represents the handle, and another that is its "local peer" that is in a proxying state to the other node.

Under this design, since endpoints travel in pairs, the only way to send a message to remote node would be if a message arrives at a local peer that is in a proxying state. Therefore, SendMessage() looks pretty simple:

Node::SendMessage(Endpoint local_endpoint, Message message) {
  Endpoint local_peer_endpoint = // get peer endpoint;
  if (local_peer_endpoint.state == kUnboundAndProxying) {
    // Peer is actually remote. Forward the message accordingly.
    Channel channel_for_remote_node = GetChannelForRemoteNode(local_peer_endpoint.node_to_proxy_to);
    channel_for_remote_node->SendMessage(message);
  } else {
    // `local_peer_endpoint` is the real peer endpoint
    local_peer_endpoint->AcceptMessage(message);
  }
}

But upon further thought, there is just no reason to require that all endpoints have local peers and therefore travel together even if you send one. All it does is slightly simplify the SendMessage() codepaths. Instead:

Nodes that send an invitation would be given one handle and one endpoint registered with Core. This endpoint concretely represent the handle, and its peer's address would be its peer in the remote node. No node would be in a proxying state
We'd have the same situation for nodes that accept invitations
We'd have a different situation for nodes that create a pair of entangled message pipes and send one end to another node. The sending node would have two endpoints, one in a proxying state forwarding messages to its concrete look-alike in the remote node. The remote node would receive a single handle, and create a single endpoint representing the handle. Its peer address would be its peer in the sending node.

This is perhaps more logical, even though that means SendMessage() gets slightly more complicated. This is because there would be two ways of sending a message to a remote node:

Node::SendMessage(Endpoint local_endpoint, Message message) {
  bool peer_is_local = (local_endpoint.peer_address.node_name == node_name_);

  if (peer_is_local) {
    Endpoint local_peer_endpoint = // get peer endpoint;
    if (local_peer_endpoint.state == kUnboundAndProxying) {
      // Peer is actually remote. Forward the message accordingly.
      Channel channel_for_remote_node = GetChannelForRemoteNode(local_peer_endpoint.node_to_proxy_to);
      channel_for_remote_node->SendMessage(message);
    } else {
      // `local_peer_endpoint` is the real peer endpoint
      local_peer_endpoint->AcceptMessage(message);
    }
  } else { // peer is remote
    Channel channel_for_remote_node = GetChannelForRemoteNode(local_endpoint.peer_address.node_name);
    channel_for_remote_node->SendMessage(message);
  }
}

In other words, this design has two main rules:

All sent endpoints are put in a permanently proxying state
Endpoints do not travel in pairs (because they don't need to)

One day we'll implement the ability for two child nodes to talk directly to each other without mediation from the originator, but the flow will be more complicated. Remember, immediately after we send an endpoint to another node the flow will look like:

Sender node puts the sent endpoint into a proxying state, in case it receives more messages from a node that might be sending messages to us
Sender tells node owning the peer endpoint of the sent node (if any) to please send stop sending subsequent messages to us, and instead start sending them to the node that we sent the endpoint to (the "final destination" from the sender's perspective)
Sender continues to forward any messages it receives to the node we sent the real endpoint to
Sender receives ACK from the node owning the peer endpoint (if such node exists), and only then deletes its proxy.
If the peer endpoint of the endpoint-we-sent is local (is not owned by another node), then we can just update said endpoint's "peer address" to point to the node that we sent the endpoint to, and that's as good as an "ACK"

domfarolino / browser

Basic mage implementation #32

Mage IPC library

[x] SendHandleToBoundEndpoint

Design for sending endpoints to other nodes