entropyxyz / entropy-core

Protocol and cryptography development.
https://docs.entropy.xyz/
GNU Affero General Public License v3.0
11 stars 2 forks source link

Allow offchain worker requests to all TSS nodes in `entropy-tss` test environment #1147

Open ameba23 opened 2 weeks ago

ameba23 commented 2 weeks ago

This PR sets the TSS node endpoints associated with all chain nodes in our test setup for entropy-tss.

It means that http requests coming from the propagation pallet will be made to all four TSS nodes. Previously, only Alice got a request from the chain and the other 3 TSS server's had similar requests made by an http client in the test code itself.

This had the advantage that we could test the handling of bad inputs given to these endpoints. But a lot of care had to be taken to make the mock requests appear at the right time, and we have had problems with the tests occasionally failing, eg: https://github.com/entropyxyz/entropy-core/issues/1119

I am hoping that we can make the tests more reliable and more closely emulate what will happen in production by doing this. I am not sure its going to work, and it might make it harder to test bad input and edge cases.

ameba23 commented 2 weeks ago

I am a bit stuck with this.

DKG is working fine but the reshare protocol has problems.

To be specific its the aux gen protocol when run from within the 'reshare' protocol:

The DKG protocol is composed of the key init, reshare and aux gen protocol - so we know all of these will work with this setup.

'Reshare' is composed of reshare and aux gen. It seems from looking at logs that the reshare runs, but only two of the three parties finalize successfully, then during aux gen a connection gets dropped.

I tried switching things around so the aux gen protocol gets run before the reshare, and it seems that aux gen is the problem as a connection gets dropped and it seems only one peer runs the protocol loop.

There are also issues with mocking the DKG using pre-generated keyshares which we do in the signing tests, as now when the jump start extrinsic is submitted, it starts a real DKG, and things get messy when the mock DKG confirmations are also submitted. But i think this should be not to hard to fix.

ameba23 commented 1 week ago

I had a hunch that the problems i am seeing are related to problems introduced by my recent PR: https://github.com/entropyxyz/entropy-core/pull/1136

I have now tried rolling back to before that commit and re-applying these changes and running the tests, but i am seeing exactly the same behavior as on this branch. So thats not the problem.

Here is the relevant part of the test logging output:

  2024-11-11T09:45:06.271344Z  INFO entropy_tss::signing_client::protocol_transport: Got ws connection, with message: SubscribeMessage { session_id: Reshare { verifying_key: [2, 24, 213, 93, 160, 174, 33, 208, 3, 29, 227, 227, 175, 53, 23, 229, 3, 220, 55, 68, 234, 118, 51, 129, 176, 175, 185, 68, 247, 93, 250, 222, 233], block_number: 21 }, public_key: 2cbc68e8bf0fbc1c28c282d1263fc9d29267dc12a1044fb730e8b65abc37524c (5D5Mw6Wb...), signature: 8259ea5bbf1308232f884f7eaf85d494c664e318ef21f1ee26b5575c0fa542789c2ca902e78fb74447016c549ff35b4cc67353360041b295da14ab0e00348781 }
    at crates/threshold-signature-server/src/signing_client/protocol_transport.rs:159

  2024-11-11T09:45:06.271898Z  INFO entropy_tss::signing_client::protocol_transport: Got ws connection, with message: SubscribeMessage { session_id: Reshare { verifying_key: [2, 24, 213, 93, 160, 174, 33, 208, 3, 29, 227, 227, 175, 53, 23, 229, 3, 220, 55, 68, 234, 118, 51, 129, 176, 175, 185, 68, 247, 93, 250, 222, 233], block_number: 21 }, public_key: 946140d3d5ddb980c74ffa1bb64353b5523d2d77cdf3dc617fd63de9d3b66338 (5FRFqLyd...), signature: 9aa6c7204d2135cadb94878b4117617a9336c4becd2ceb04903d570a7ad6fa38d223dec1ca25f4279a49191296a8cfeb386117be6b33e912e55fc1e22aaba288 }
    at crates/threshold-signature-server/src/signing_client/protocol_transport.rs:159

  2024-11-11T09:45:06.272088Z  INFO entropy_tss::signing_client::protocol_transport: Got ws connection, with message: SubscribeMessage { session_id: Reshare { verifying_key: [2, 24, 213, 93, 160, 174, 33, 208, 3, 29, 227, 227, 175, 53, 23, 229, 3, 220, 55, 68, 234, 118, 51, 129, 176, 175, 185, 68, 247, 93, 250, 222, 233], block_number: 21 }, public_key: 946140d3d5ddb980c74ffa1bb64353b5523d2d77cdf3dc617fd63de9d3b66338 (5FRFqLyd...), signature: b4a22cbf3746335a7a757824c63cbeb01a31a3fa428d4a860c47ae790257a118bdabe88fe31e4ad8a854d8d49b6770c7fd192a7a88813753dc116a16bd726b82 }
    at crates/threshold-signature-server/src/signing_client/protocol_transport.rs:159

  2024-11-11T09:45:06.272212Z  INFO entropy_protocol::execute_protocol: Executing reshare
    at crates/protocol/src/execute_protocol.rs:343
    in entropy_tss::validator::api::new_reshare
    in entropy_tss::http-request with uuid: d4731f8e-5b02-4f75-8218-6a612164990a, uri: /validator/reshare, method: POST

  2024-11-11T09:45:06.272554Z  INFO entropy_protocol::execute_protocol: Executing reshare
    at crates/protocol/src/execute_protocol.rs:343
    in entropy_tss::validator::api::new_reshare
    in entropy_tss::http-request with uuid: 829e963e-68a0-4c46-b628-a42aa0c1bba4, uri: /validator/reshare, method: POST

  2024-11-11T09:45:06.272964Z  INFO entropy_protocol::execute_protocol: Executing reshare
    at crates/protocol/src/execute_protocol.rs:343
    in entropy_tss::validator::api::new_reshare
    in entropy_tss::http-request with uuid: badb679b-ba0f-4287-bc39-f65952c250f2, uri: /validator/reshare, method: POST

  2024-11-11T09:45:06.274708Z  INFO entropy_protocol::execute_protocol: Finished reshare
    at crates/protocol/src/execute_protocol.rs:361
    in entropy_tss::validator::api::new_reshare
    in entropy_tss::http-request with uuid: 829e963e-68a0-4c46-b628-a42aa0c1bba4, uri: /validator/reshare, method: POST

  2024-11-11T09:45:06.274719Z  INFO entropy_protocol::execute_protocol: Starting aux gen
    at crates/protocol/src/execute_protocol.rs:365
    in entropy_tss::validator::api::new_reshare
    in entropy_tss::http-request with uuid: 829e963e-68a0-4c46-b628-a42aa0c1bba4, uri: /validator/reshare, method: POST

  2024-11-11T09:45:10.401568Z  INFO entropy_protocol::execute_protocol: Finished reshare
    at crates/protocol/src/execute_protocol.rs:361
    in entropy_tss::validator::api::new_reshare
    in entropy_tss::http-request with uuid: d4731f8e-5b02-4f75-8218-6a612164990a, uri: /validator/reshare, method: POST

  2024-11-11T09:45:10.401590Z  INFO entropy_protocol::execute_protocol: Starting aux gen
    at crates/protocol/src/execute_protocol.rs:365
    in entropy_tss::validator::api::new_reshare
    in entropy_tss::http-request with uuid: d4731f8e-5b02-4f75-8218-6a612164990a, uri: /validator/reshare, method: POST

  2024-11-11T09:45:17.100457Z  WARN entropy_tss::signing_client::api: Websocket connection closed unexpectedly MessageAfterProtocolFinish
    at crates/threshold-signature-server/src/signing_client/api.rs:140
ameba23 commented 1 day ago

Had a look at this problem with @HCastano thismorning. He made a suggestion to try running the DKG with an additional reshare immediately after the DKG finishes. To see if it is something to do with the test conditions for the reshare test rather than the reshare protocol itself.

I tried doing this, but got into a muddle with new holders and old holders, and was having synedrion's no entry found for key error.

But if i run an extra aux gen protocol at the end of the DKG protocol, it does work.