Make a tool that either proves the DXOS network is functioning, or provides diagnostic data to allow any failure to be isolate to a specific layer or network component. Similar to the ping and traceroute tools for IP networks. Note that the capability may be provided as embedded functionality inside an application, and not as a stand alone tool, if that provides the best user experience.
Use cases
User runs a DXOS SDK browser-hosted application (e.g. teamwork or arena) from a Kube. They interact with a collaborator who loaded their application from a different Kube. Application state updates from the collaborator seem to stall (and vice versa). Diagnose the cause.
User runs the wire cli tool to spawn a bot on a kube. No confirmation of successful bot spawning is received. Diagnose the cause.
Network failure diagnosis
Diagnosis requires a white box analysis of the end-to-end path for each type of network interaction supported by the system:
Party invitation redemption
In order to successfully redeem an invitation the local node needs to establish a hypercore peer connection with a greeter node.
Greeter nodes are associated with an invitation swarm key. Greeters can only be found if the local node has functioning signal service.
Once a set of one or more potential greeters have been found a hypercore transport connection must be successfully established.
Party state replication
In order to replicate party state the local node needs to have one or more open and functioning hypercore peer connection.
Peers are associated with the party swarm key. Peers can only be found if the local node has functioning signal service.
Once a set of one or more potential greeters have been found a hypercore transport connection must be successfully established.
Bot spawn
In order to successfully spawn a bot the local node needs to establish a hypercore peer connection with the bot factory node.
The bot factory has a unique swarm key. Greeters can only be found if the local node has functioning signal service.
Once the bot factory node has been found a hypercore transport connection must be successfully established.
Failure modes
The network interactions above share a number of common failure modes:
No TCP/IP network.
No open connection to signal service.
Signal service host name DNS failure.
Signal service connection rejected.
Signal service connection failed with 500 etc.
Signal service connected but not responding.
Signal service connected and responding but no available peers for swarm key.
Signal service provides one or more peers for swarm key but no peer connections succeed.
Peer connection fails due to WebRTC stack not functioning.
Peer connection fails due to no response from peer.
Peer connection fails due to ICE fail (e.g. TURN required but no TURN service available).
Peer connection established but subsequently closed.
Peer connection established but no data propagated.
Diagnostic tool functionality
Display signal service connection status (open/responding, not open/failed, last time succeeded etc).
Display peer connection status (open, connecting, failed, etc).
Heartbeat functionality on signal service connections.
Heartbeat functionality on p2p WebRTC connections.
Requirement
Make a tool that either proves the DXOS network is functioning, or provides diagnostic data to allow any failure to be isolate to a specific layer or network component. Similar to the
ping
andtraceroute
tools for IP networks. Note that the capability may be provided as embedded functionality inside an application, and not as a stand alone tool, if that provides the best user experience.Use cases
wire
cli tool to spawn a bot on a kube. No confirmation of successful bot spawning is received. Diagnose the cause.Network failure diagnosis
Diagnosis requires a white box analysis of the end-to-end path for each type of network interaction supported by the system:
Party invitation redemption
In order to successfully redeem an invitation the local node needs to establish a hypercore peer connection with a greeter node. Greeter nodes are associated with an invitation swarm key. Greeters can only be found if the local node has functioning signal service. Once a set of one or more potential greeters have been found a hypercore transport connection must be successfully established.
Party state replication
In order to replicate party state the local node needs to have one or more open and functioning hypercore peer connection. Peers are associated with the party swarm key. Peers can only be found if the local node has functioning signal service. Once a set of one or more potential greeters have been found a hypercore transport connection must be successfully established.
Bot spawn
In order to successfully spawn a bot the local node needs to establish a hypercore peer connection with the bot factory node. The bot factory has a unique swarm key. Greeters can only be found if the local node has functioning signal service. Once the bot factory node has been found a hypercore transport connection must be successfully established.
Failure modes
The network interactions above share a number of common failure modes:
Diagnostic tool functionality