After installing a XenStore watch with callback, not all events get called back.
Extending the examples/Core/ClientRendezvous example to 15 clients should be sufficient to demonstrate. Most runs, I get between 11 and 14 successful rendezvous out of 15. I added a debug print to the server-side (left side) routine in Communication.Rendezvous.clientServerConnection, dumping the key to the console. This amended example is here
The most common scenario: the client proceeds correctly, and the server misses the initial xsMakeDirectory event. Server then receives and ignores the ClientGrants and ClientPorts messages. e.g. dom559 in the below example, who is lucky enough to be ignored twice (initial mkdir and ClientGrants). This seems to only happen under contention. Reproduce by repeating sudo make run until the Server and some Client domains fail to halt. I enabled Xen console logging to retrieve the audit trail.
XenStore watch fired for /rendezvous/ClientServerTest/dom557/ServerConfirmed
XenStore watch fired for /rendezvous/ClientServerTest/dom558XenStore watch fired for /rendezvous/ClientServerTest/dom558/ClientGrants
XenStore watch fired for /rendezvous/ClientServerTest/dom558/ClientPorts
Waiting for /rendezvous/ClientServerTest/dom558/ClientGrants
Waiting for /rendezvous/ClientServerTest/dom558/ClientPorts
XenStore watch fired for /rendezvous/ClientServerTest/dom558/ServerConfirmed
XenStore watch fired for /rendezvous/ClientServerTest/dom559/ClientPorts
XenStore watch fired for /rendezvous/ClientServerTest/dom560
Waiting for /rendezvous/ClientServerTest/dom560/ClientGrants
XenStore watch fired for /rendezvous/ClientServerTest/dom560/ClientGrantsWaiting for /rendezvous/ClientServerTest/dom560/ClientPortsXenStore watch fired for /rendezvous/ClientServerTest
/dom560/ClientPorts
XenStore watch fired for /rendezvous/ClientServerTest/dom560/ServerConfirmed
XenStore watch fired for /rendezvous/ClientServerTest/dom561XenStore watch fired for /rendezvous/ClientServerTest/dom561/ClientGrants
Another scenario involves either client or server calling waitForKey (for grants, ports or ServerConfirmed messages) and the waiting thread never waking up from a threadDelay, but that will be another ticket.
I've been working on a new version of the Rendezvous protocols that assumes XenStore's unreliability, but that work is blocked on the scheduling problem.
After installing a XenStore watch with callback, not all events get called back.
Extending the
examples/Core/ClientRendezvous
example to 15 clients should be sufficient to demonstrate. Most runs, I get between 11 and 14 successful rendezvous out of 15. I added a debug print to the server-side (left side) routine inCommunication.Rendezvous.clientServerConnection
, dumping the key to the console. This amended example is hereThe most common scenario: the client proceeds correctly, and the server misses the initial
xsMakeDirectory
event. Server then receives and ignores theClientGrants
andClientPorts
messages. e.g. dom559 in the below example, who is lucky enough to be ignored twice (initial mkdir and ClientGrants). This seems to only happen under contention. Reproduce by repeatingsudo make run
until the Server and some Client domains fail to halt. I enabled Xen console logging to retrieve the audit trail.Another scenario involves either client or server calling
waitForKey
(for grants, ports or ServerConfirmed messages) and the waiting thread never waking up from athreadDelay
, but that will be another ticket.