Closed emmacasolin closed 2 years ago
From https://github.com/MatrixAI/js-polykey/pull/326#issuecomment-1041133623
@joshuakarp when the
await this.nodeManager.refreshBuckets();
is executed, it says it must be done upon key renewal.However how does it get access to the new key? Is it calling the
KeyManager
internally? For something like this, it'd make sense to pass the the new key if it has to use it. Also is there any downsides to this? Seems like something to incorporate into #317.
This should be incorporated to the testing here.
Changes from #326 will now bring in recovery code information when the keypair changes too.
But this revealed that pk keys
commands should also report the new recovery code if the keypair changes. This is something that should be checked too.
Any existing connections proxy connections or node connections should continue to work without problems, however asking for the node ID may give a different answer than the expected original node ID during connection establishment.
Furthermore, if a node is a aware that there is a newer node ID, it should proceed to update its own NodeGraph with this information, as well as any pre-existing relationships that the old node ID had. So for example if old node ID was part of a gestalt that had certain permissions, then the new node ID is now part of that gestalt as well.
The user should be notified somehow about the fact that they are using an old node ID. And that they should switch to the new node ID. However if they continue to try to contact the old node ID on the CLI or GUI, then it should still work as long as the old node ID is still part of the valid certificate chain. This should mean that we don't delete the old node ID.
Any garbage collection of the old node ID should only occur lazily. Not sure exactly when, but perhaps as part of a natural forgetting process of the PK network.
There are 3 kinds of key reset.
resetRootCert
, This recreates the cert chain from scratch but keeps root keyPairresetRootKeyPair
, This recreates the cert chain from scratch with new root keyPairrenewRootKeyPair
, this creates a new root keyPair and adds it to the cert chain. Old one should still be valid.We expect there to be issues with chaning the root keyPair of the receiving side since that's the side that is authenticated. The conditions for a failure to authenticate would be the target having a new keyPair AND the old one not in the cer chain.By theses conditions only resetRootKeyPair
will cause a node connecting with old details to fail.
I'd expect RenewRootKeyPair to still authenticate properly since it will keep the only cert on the cert chain. But currently it is failing so I need to look into it. I suggests that the authenticate process is not checking the old certs in the chain properly. Or the old cert is not properly maintained in the chain.
When trying to connect with bad authentication we get the following error out of the proxy in the logger ERROR:NodeConnection test:Failed CONNECT to 127.0.0.1:34670 - ErrorCertChainUnclaimed: Node ID is not claimed by any certificate
. But this error doesn't propigate down to the NodeConnection
so as far as the NodeConnection
can tell it's just failing to connect and times out. Shouldn't we be throwing some kind of authentication error instead?
Also just a note. I can get the tests done today but that's just to check that things stil work as expected when reseting the root keypair. The aspect of this issue where a connecting node will notice that the nodeId/cerChain has been updated and reacts accordingly still needs to be speced out and done. I don't think that it is a strict requirenment for #378 so we can extract that part of the issue out and do it at a later date.
@emmacasolin this NAT testing should eventually incorporate this as well, so that when a keypair changes, any connection should continue to work even through NAT busted connections, or eventually through NAT relayed connections.
There is a problem with verifying the certchain
when establishing a connection. There seems to only be 1 cert in the cert chain. Even when there are multiple certs in the certChain
in the KeyManager
. I'm digging into it.
It seems like the receiving side only provides one certificate for the connection. We should be providing the whole chain.
So the keychange is meant to affect the sigchain and gestaltgraph in some what. lets start by asking what these are...
sigchain
. This is used to store claims made to and from other nodes and identities. it is stored in a signed chain to maintain authenticity of the claims. gestaltGraph
. is used to track all of the nodes within a Gestalt.ACL
is used to track the permissions between nodes.
So how would a key change affect the sigchain
? Let's see.... If the root KeyPair changes but is not a part of the cert chain then we essentially disavow the old NodeId/KeyPair. If the old nodeId is still in the certChain then the claims should still be valid.
The GestaltGraph
and the ACL
Needs to have old NodeIds converted over to the new NodeId with it is discovered that the ID has changed. We will need a way to discover that the NodeId has changed. We also need a way to update a NodeId
of a node when we find that it has changed.
How does the NodeId
changing affect discovery of the node? I'm sure that for a while the old NodeId
can still be found but will be updated with the new NodeId
when it gets connected to. Do we need some mecanism for ensuring it can be found by it's old ID searching for it on the network? It seems possible that nodes with permissions or part of it's gestalt can become 'estranged'.
The sigchain
is not actively affected by the root keyPair change event. They however depend on the changed state in the KeyManager
. This just means we need to make sure it checks f the nodeId is part of the certificate chain during the relevant checks.
General spec task outline
SigChain
we just need to take in account the full certChain
when verifying SigChain
data. I think this is done by sigChainUtils.verifyChainData
. GestaltGraph
and ACL
need to know a Node by it's old ID and convert the old ID to the new one over time. Likewise if it needs to verify the ID it needs to use the nodes whole certChain to do so.Yea key change is a big problem, but that's why I wanted to focus only on maintaining the node connection for now and tackle the key change problem for the other domains separately. Good to keep these in mind for a new issue.
Ok then, all the connection tests are working except for one case. I'll fix that and then this should be done. The details in the comments above can be part of a new issue tacking this general key change problem.
A portion of this work was done in #378, only addressing maintaining node connections while keypair is changed. @tegefaulkes please create a new issue for general keypair change testing.
Essentially what was addressed in this issue was handling the root cert changing and by extension the local Agent's NodeId. This means that when the root cert is changed we
grpcServerClient
and Proxy
NodeGraph
buckets with NodeManager.resetBuckets()
.What hasn't been addressed is handling remote nodes who's NodeId
's have changed. We need to be able to
NodeId
has changed.NodeId
as we would with it's old NodeId
.NodeId
be updated to use the new NodeId
. This process needs to happen lazily. Domains such as ACL, Gestalt Graph and SigChain will need to handle this.I will create a new issue specifying what still needs to be done.
Please reference the files where these edits were made too. Also when you create the new issue referencing this one, close this one.
I worked on adding tests for this issue. the bulk of the work was done by Emma. As far as I know these changes were made.
src/keys/KeyManager.ts
- a changeCallback
was added to the constructor. it is called when the root certificate and NodeId
changes.src/events/EventBus.ts
- was created. This provides an asynchronious way to handle events. It allows the root cert change to propagate to other domains.src/PolykeyAgent.ts
- passes the changeCallback to the KeyManager
and creates the handler for the cert change event. This event updates the status file, NodeGraph
buckets, and the grpcServerCLient
and proxy TLS configs.tests/PolykeyAgent.test.ts
- intergration tests were added to see if the cert change event propigates properly.tests/nodes/NodeConnection.test.ts
- tests were added to check the expected behviour when changing the TLS config in various situations.I'm writing up the new issue now.
New issue has been created at #386.
Specification
When the root keypair is renewed, we expect the old nodeId and client certificate to still be valid (in constrast to when it is reset).
From https://github.com/MatrixAI/js-polykey/issues/312#issuecomment-1016137891:
Changing the node id will impact communications between:
Under Agent to Agent, this is going through the forward and reverse proxies, and also the GRPCClientAgent. The GRPC client does not use TLS but the TLS is being done between forward and reverse proxies. This means that when we perform a keyManager.renewRootKeyPair, the old NodeId is still valid until the certificate expiration date. This means nodes contacting the changing agent with the old NodeId should still work, but receive an automatic update to the NodeGraph when the node id changes. I believe the logic to check the cert chain is in network/utils.ts verifyServerCertificateChain, but the logic to update the node id in the nodegraph is not verified yet.
Under CLI to Agent and GUI to Agent, they are using direct TLS on the GRPCClient. Existing GRPCClient connections should continue to work and not be interrupted by the TLS change. I believe I had tests for this under tests/grpc/GRPCServer.test.ts as changing the private key and certificate on the fly. However new connections will need to be made with a new target Node ID. This can occur by reading the status each time a new connection is made. However I think a similar TLS verification logic could take place where the old node ID can still be valid.
So we need tests to ensure this is the case. One for tests/grpc/GRPCClient.test.ts to ensure that the GRPCClient node id can still work. Another for tests/network as well.
Then we would extend these tests into tests/nodes for NodeConnection, and also in tests/bin to test CLI situation.
Additional context
Tasks