Open amydevs opened 1 month ago
@amydevs This will be easier to start with if the token logic is bending your mind. The only point of contact with the token logic here is the authentication logic utility function and the token payload. You should stub both of these out in testing.
What's the status of this? @amydevs @tegefaulkes
This still needs to be worked on. I'll be taking it over while @amydevs starts on the PKE work with the DB domain.
With #775 being merged the underlying data structure for separating networks has been implemented. This issue will focus on Managing the connections in a way to keep the networks separate. It will also only allow certain RPC calls during the connection's unauthenticated state.
When this issue is completed then we should have everything implemented to separate public networks. Further expansion to the authentication token logic will need to be done to allow for private networks later down the line.
After handing this of to Brian, we discussed over several options to implement this.
The most intuitive way would be to make sure that the static createNodeConnection
function awaits for the authentication process to finish before returning with a NodeConnection
.
However, this presents problems:
createNodeConnection
. The resolution of createNodeConnection
promise depends on a handled call by the RPCServer. However, at that point, the NodeConnection
has not been created yet, so there is no conceivable way to notify the createNodeConnection
method call that authentication has finished.NodeConnection
be available during the handling of the RPC call that authenticates the peer. This is not possible as the RPCServer instance is established per NodeConnectionManager rather than per NodeConnection.createNodeConnection
that will resolve once the connection has been authenticated.I was thinking that node connections is a lower level, and just do your gated calls at the RPC layer. Remember it's an application layer concern that they aren't on the same network. You have to check their sigchain claim. I don't think node connections being a lower level concern should even be aware of this problem. Factor out the abstraction to solve this.
I was thinking that node connections is a lower level, and just do your gated calls at the RPC layer. Remember it's an application layer concern that they aren't on the same network. You have to check their sigchain claim. I don't think node connections being a lower level concern should even be aware of this problem. Factor out the abstraction to solve this.
Importantly don't mix up abstraction layers. Otherwise the entanglement will cause modularity problems in the future.
I was hoping at the time to isolate the changes to the NodeConnection
so we wouldn't need to modify the NodeConnectionManager
at all. After discussing this with @amydevs and mulling over it. It seems that the best place to manage this logic will be in the NodeConnectionManager
itself.
This means that the NCM will coordinate the NC authentication process and only make the connection available to be used after that has been completed. We should be able to get away with not allowing most RPC calls since we can prevent access to the NC in question until it has been verified to be part of the network.
Can you disallow ALL RPC calls? Usually RPC authentication would be an RPC middleware. But if you put it into NCM, then you'd need to call the RPC in the NCM. That would also mean NCM ends up having knowledge about sigchain claims. I feel like that's too much knowledge built into NCM. Seems like an NM sort of thing to know. Remember NCM is just NCs, but NM can do higher level semantic operations like knowing about the sigchain and making interpretations on it. NM seems like a better place for all this logic.
We can, but there are two aspects to it.
I don't know what you mean by 1. But I thought we can make node connections established simply because of quic. Then sigchain has to be checked at a higher abstraction layer?
That's what we're doing, yes. There are two levels of authenticating the connection. The normal connection level done by QUIC which will be unchanged. And the higher level where we negotiate authentication for the network. 1. Is only saying that we can maybe get away without disabling the RPC calls since we can't make them without access to the NodeConnection and we can't get the NodeConnection from the NCM without it being authenticated.
But since we can't control when the reverse RPC requests can be made then we'd probably have to enforce it for the handlers via the reverse middleware anyway. May has well handle it for both directions at that point.
When this is merged - that means testnet and mainnet is separated.
Specification
In order to segregate nodes of different networks using the
ClaimNetworkAccess
tokens defined in https://github.com/MatrixAI/Polykey/issues/779, there needs to be some logic for nodes to prevent accepting RPC calls from nodes that are out-of-network. There will need to be some RPC calls that are whitelisted to enable some level of out of network negotiation such as authenticating into the network or requesting access.Currently a
NodeConnection
has two stages, the creating stage when you are awaiting thecreateNodeConnection
factory. And the connected stage when theNodeConnection
fully connects and the creation resolve. The problem with this is we need to be able to authenticate without allowing most RPC calls and nominal traffic.To this end we need a connected but unauthenticated state where the
NodeConnection
is fully operational but not entertaining RPC calls by rejecting them. TheNodeConnection.createNodeConnection
should negotiate it's access to the network before resolving with the completed connection. This means the connecting state and authenticating state is hidden inside the creation of theNodeConnection
. This will mean minimal changes to how theNodeConnectionManager
handles these connections. However, theNodeConnection
will need separate connected, authenticated and created events to reflect these stages.While in the authenticating state, all non whitelisted RPC calls need to be rejected outright. That said, the NodeConnection shouldn't be available to make these calls at this stage but we still need to be secure about it. To avoid having to add logic to all of the RPC handlers we should apply this connection rejection logic to the middleware. The middleware will need to refer back to the
NodeConnection
somehow to assess the authenticated state.To authenticate the
NodeConnection
needs to make an authentication RPC call and provide a valid network token. It's up to the handler of this to decide to reject the authentication with an error and kill theNodeConnection
if it fails. Note that this needs to be symmetric in the forward and reverse direction. BOTH sides of the connection need to fully authenticate. This opens us to annoying race conditions so we need to be extra careful here. The handshake has been described within #779.It is extremely important that the following conditions are met.
NodeConnection
is only fully created if it fully authenticates.NodeConnection
is only added to theNodeConnectionManager
's connection map if it fully authenticates.NodeGraph
if it fully authenticates.Authenticating
state.Additional context
Related: #779 - Defines the network access tokens and how they are verified. Related: #770 - Parent issue
Tasks
NodeConnection
.NodeConnection
must switch to theauthenticated
state and fully create only after both the authentication handler and call succeeds and resolves.