Open ameba23 opened 2 weeks ago
I am a bit stuck with this.
DKG is working fine but the reshare protocol has problems.
To be specific its the aux gen protocol when run from within the 'reshare' protocol:
The DKG protocol is composed of the key init, reshare and aux gen protocol - so we know all of these will work with this setup.
'Reshare' is composed of reshare and aux gen. It seems from looking at logs that the reshare runs, but only two of the three parties finalize successfully, then during aux gen a connection gets dropped.
I tried switching things around so the aux gen protocol gets run before the reshare, and it seems that aux gen is the problem as a connection gets dropped and it seems only one peer runs the protocol loop.
There are also issues with mocking the DKG using pre-generated keyshares which we do in the signing tests, as now when the jump start extrinsic is submitted, it starts a real DKG, and things get messy when the mock DKG confirmations are also submitted. But i think this should be not to hard to fix.
I had a hunch that the problems i am seeing are related to problems introduced by my recent PR: https://github.com/entropyxyz/entropy-core/pull/1136
I have now tried rolling back to before that commit and re-applying these changes and running the tests, but i am seeing exactly the same behavior as on this branch. So thats not the problem.
Here is the relevant part of the test logging output:
2024-11-11T09:45:06.271344Z INFO entropy_tss::signing_client::protocol_transport: Got ws connection, with message: SubscribeMessage { session_id: Reshare { verifying_key: [2, 24, 213, 93, 160, 174, 33, 208, 3, 29, 227, 227, 175, 53, 23, 229, 3, 220, 55, 68, 234, 118, 51, 129, 176, 175, 185, 68, 247, 93, 250, 222, 233], block_number: 21 }, public_key: 2cbc68e8bf0fbc1c28c282d1263fc9d29267dc12a1044fb730e8b65abc37524c (5D5Mw6Wb...), signature: 8259ea5bbf1308232f884f7eaf85d494c664e318ef21f1ee26b5575c0fa542789c2ca902e78fb74447016c549ff35b4cc67353360041b295da14ab0e00348781 }
at crates/threshold-signature-server/src/signing_client/protocol_transport.rs:159
2024-11-11T09:45:06.271898Z INFO entropy_tss::signing_client::protocol_transport: Got ws connection, with message: SubscribeMessage { session_id: Reshare { verifying_key: [2, 24, 213, 93, 160, 174, 33, 208, 3, 29, 227, 227, 175, 53, 23, 229, 3, 220, 55, 68, 234, 118, 51, 129, 176, 175, 185, 68, 247, 93, 250, 222, 233], block_number: 21 }, public_key: 946140d3d5ddb980c74ffa1bb64353b5523d2d77cdf3dc617fd63de9d3b66338 (5FRFqLyd...), signature: 9aa6c7204d2135cadb94878b4117617a9336c4becd2ceb04903d570a7ad6fa38d223dec1ca25f4279a49191296a8cfeb386117be6b33e912e55fc1e22aaba288 }
at crates/threshold-signature-server/src/signing_client/protocol_transport.rs:159
2024-11-11T09:45:06.272088Z INFO entropy_tss::signing_client::protocol_transport: Got ws connection, with message: SubscribeMessage { session_id: Reshare { verifying_key: [2, 24, 213, 93, 160, 174, 33, 208, 3, 29, 227, 227, 175, 53, 23, 229, 3, 220, 55, 68, 234, 118, 51, 129, 176, 175, 185, 68, 247, 93, 250, 222, 233], block_number: 21 }, public_key: 946140d3d5ddb980c74ffa1bb64353b5523d2d77cdf3dc617fd63de9d3b66338 (5FRFqLyd...), signature: b4a22cbf3746335a7a757824c63cbeb01a31a3fa428d4a860c47ae790257a118bdabe88fe31e4ad8a854d8d49b6770c7fd192a7a88813753dc116a16bd726b82 }
at crates/threshold-signature-server/src/signing_client/protocol_transport.rs:159
2024-11-11T09:45:06.272212Z INFO entropy_protocol::execute_protocol: Executing reshare
at crates/protocol/src/execute_protocol.rs:343
in entropy_tss::validator::api::new_reshare
in entropy_tss::http-request with uuid: d4731f8e-5b02-4f75-8218-6a612164990a, uri: /validator/reshare, method: POST
2024-11-11T09:45:06.272554Z INFO entropy_protocol::execute_protocol: Executing reshare
at crates/protocol/src/execute_protocol.rs:343
in entropy_tss::validator::api::new_reshare
in entropy_tss::http-request with uuid: 829e963e-68a0-4c46-b628-a42aa0c1bba4, uri: /validator/reshare, method: POST
2024-11-11T09:45:06.272964Z INFO entropy_protocol::execute_protocol: Executing reshare
at crates/protocol/src/execute_protocol.rs:343
in entropy_tss::validator::api::new_reshare
in entropy_tss::http-request with uuid: badb679b-ba0f-4287-bc39-f65952c250f2, uri: /validator/reshare, method: POST
2024-11-11T09:45:06.274708Z INFO entropy_protocol::execute_protocol: Finished reshare
at crates/protocol/src/execute_protocol.rs:361
in entropy_tss::validator::api::new_reshare
in entropy_tss::http-request with uuid: 829e963e-68a0-4c46-b628-a42aa0c1bba4, uri: /validator/reshare, method: POST
2024-11-11T09:45:06.274719Z INFO entropy_protocol::execute_protocol: Starting aux gen
at crates/protocol/src/execute_protocol.rs:365
in entropy_tss::validator::api::new_reshare
in entropy_tss::http-request with uuid: 829e963e-68a0-4c46-b628-a42aa0c1bba4, uri: /validator/reshare, method: POST
2024-11-11T09:45:10.401568Z INFO entropy_protocol::execute_protocol: Finished reshare
at crates/protocol/src/execute_protocol.rs:361
in entropy_tss::validator::api::new_reshare
in entropy_tss::http-request with uuid: d4731f8e-5b02-4f75-8218-6a612164990a, uri: /validator/reshare, method: POST
2024-11-11T09:45:10.401590Z INFO entropy_protocol::execute_protocol: Starting aux gen
at crates/protocol/src/execute_protocol.rs:365
in entropy_tss::validator::api::new_reshare
in entropy_tss::http-request with uuid: d4731f8e-5b02-4f75-8218-6a612164990a, uri: /validator/reshare, method: POST
2024-11-11T09:45:17.100457Z WARN entropy_tss::signing_client::api: Websocket connection closed unexpectedly MessageAfterProtocolFinish
at crates/threshold-signature-server/src/signing_client/api.rs:140
Had a look at this problem with @HCastano thismorning. He made a suggestion to try running the DKG with an additional reshare immediately after the DKG finishes. To see if it is something to do with the test conditions for the reshare test rather than the reshare protocol itself.
I tried doing this, but got into a muddle with new holders and old holders, and was having synedrion's no entry found for key
error.
But if i run an extra aux gen protocol at the end of the DKG protocol, it does work.
This PR sets the TSS node endpoints associated with all chain nodes in our test setup for
entropy-tss
.It means that http requests coming from the propagation pallet will be made to all four TSS nodes. Previously, only Alice got a request from the chain and the other 3 TSS server's had similar requests made by an http client in the test code itself.
This had the advantage that we could test the handling of bad inputs given to these endpoints. But a lot of care had to be taken to make the mock requests appear at the right time, and we have had problems with the tests occasionally failing, eg: https://github.com/entropyxyz/entropy-core/issues/1119
I am hoping that we can make the tests more reliable and more closely emulate what will happen in production by doing this. I am not sure its going to work, and it might make it harder to test bad input and edge cases.