entropyxyz / entropy-core

Protocol and cryptography development.
https://docs.entropy.xyz/
GNU Affero General Public License v3.0
11 stars 2 forks source link

TSS server's `test_store_share` occasionally fails in CI with keyshare mismatch #635

Closed ameba23 closed 2 months ago

ameba23 commented 9 months ago

Occasionally in CI test_store_share fails with:

thread 'user::tests::test_store_share' panicked at crates/threshold-signature-server/src/user/tests.rs:640:5:
assertion `left != right` failed

See the failure in CI here: https://app.circleci.com/pipelines/github/entropyxyz/entropy-core/3173/workflows/ebe689da-3660-4093-9c51-ddb1a8f57701/jobs/11114

This came up in this PR https://github.com/entropyxyz/entropy-core/pull/634

Looking at the test, this will always happen if the registration takes longer than 10 seconds to complete: https://github.com/entropyxyz/entropy-core/blob/0b8bda921ca90b579f090dea6e19cd5c809515ee/crates/threshold-signature-server/src/user/tests.rs#L622

So one way to address this would be to increase the timeout from 10 seconds to say 30 seconds.

HCastano commented 9 months ago

So one way to address this would be to increase the timeout from 10 seconds to say 30 seconds.

This seems like a bandaid fix to me, and it's only going to make our already slow test times even worse.

I guess the main thing to know here is why would the keyshares differ if registration takes longer than expected?

ameba23 commented 9 months ago

I guess the main thing to know here is why would the keyshares differ if registration takes longer than expected?

spawn_testing_validators puts an initial keyshare directly in the db. Then on successful registration it gets replaced, causing the keyshares before and after registration to differ in the happy case. But if registration is too slow the keyshares stay the same because registration has not yet completed so the old one is still there.

Probably we could make it clearer what is happenning if we don't pass the account ID to spawn_testing_validators, which means no initial keyshare would get stored. And then we can check whether registration was successful by checking whether or not there is a keyshare at all, rather than comparing before and after keyshares.

Still though - the problem is that we need to poll for a registration confirmation to know that registration has completed. So we need a timeout (in this case 10 seconds) to prevent us from polling forever when registration fails for whatever reason.

HCastano commented 2 months ago

This test was removed in https://github.com/entropyxyz/entropy-core/pull/1026.