entropyxyz / entropy-core

Protocol and cryptography development.
https://docs.entropy.xyz/
GNU Affero General Public License v3.0
9 stars 1 forks source link

Client-side benchmarking for signing on testnet #622

Open ameba23 opened 7 months ago

ameba23 commented 7 months ago

I have been doing some client-side benchmarks to see how things will realistically take from the users perspective.

I think there was a bit of worry that things are going to be slow when increasing the party size, as the benchmarks i initially posted on discord showed that private mode signing (with 3 parties) was considerably slower.

The good news is i was running the client in debug mode and moving to release mode reduces the time taken to run the protocol on the client side quite a bit - its around twice as fast.

I'm not going to focus on times for registering, because that is much less time-critical, and since we poll for register confirmations, its heavily dependent on how that is done.

So lets focus on signing.

Getting the signing committee (doing queries to the chain and figuring out which TSS servers are going to sign this particular message) takes around 340ms.

Signing in permissioned mode:

Takes altogether around 2.95 seconds. This includes a filesystem check for a keyshare file, because we initially don't know whether we are private mode or not. Excluding that, and excluding any processing of the signature response, we are down to around 1.65 seconds. Bear in mind 340ms of that is spent getting the signing committee.

Excluding all the preparation, and starting the clock at the point of making http requests to TSS servers, and stopping it as soon as we get the first signature response, we are at 1.29 seconds.

Signing in private mode:

Here we are at around 5.31 seconds for the whole client process.

Excluding all the preparation, and starting the clock at the point of making http requests to TSS servers, and stopping it as soon as we get the first signature response, we are at 3.67 seconds.

So much better than the initial benchmarks.

I should however add that to avoid programs playing a role in this, i am using the simplest possible program, which just always evaluates to true. So in reality things might well take a bit longer.

Also note all these benchmarks were run in europe and i expect the Entropy nodes are all east US.

HCastano commented 7 months ago

Great work so far!

A few thoughts come to mind:

And yes, all the servers are located in US East. This should be easy to change if we want nodes elsewhere though.

ameba23 commented 7 months ago
  • Can you elaborate on the methodology used to run the benchmarks? E.g are there scripts or reproduction steps that we can use, what tool(s) are you using to measure execution times, etc. ?

The test CLI reports the complete time taken for any command to run using std::time::Instant::now() and then printing Instant::elapsed(). I added some similar logging like this internally in testing-utils::test_client which i have not committed.

  • Have you tried running any of this against local networks? I'd be curious to see how much of the "execution" time might be due to latency across a network vs. actual execution

I will try that and see what difference it makes. There could also be slight differences in the time to run get_current_subgroup_signers due to there being 2 nodes rather than 4, but i doubt that will amount to much.

  • The reason for the above is because I'd also want to see how this scale with different numbers of validators, signing groups, etc. and this might be easier to run benches for locally instead of deploying infra (at least for now)

Definitely. I am really curios to see what happens when we increase the signing party size. Not just for performance but also possible bugs.

  • As a future thing, it would be cool to run some of these benchmarks as part of CI. That was if a PR changes anything significantly we can flag it down and address it

Yeah that would be cool. My proposal would be to add a time benchmark to the signing integration test at the point of calling the sign function, to see the complete process from the client perspective - here:

https://github.com/entropyxyz/entropy-core/blob/67dfff7224030ebec7314e6583035e0572c620d0/crates/threshold-signature-server/tests/sign.rs#L74

And for seeing how long the internal parts of the process take, i guess we can add some server-side logging for example on the do_signing function or get_current_subgroup_signers.

I have no idea how we would actually turn this into actionable information in CI though, i have never done something like that before.