Benchmarking paper - Githubissues

rheitjoh commented 3 years ago

Rewrites the benchmarking page partially to address #21.

Also contains rewrite of group operation counting section to adress new bucket system.

feidens commented 3 years ago

I feel like what's written now still has this "I need to use these weird workarounds because of this list of issues that I'm now aware of" aura that I wanted to avoid. Yes, in the current version, the reader will probably understand the reasons behind these, since these are well-explained, but we're still not supplying a simple coherent view on testing/counting that doesn't depend on the reader considering these miniscule implementation details of our lib (which causes insecurity in people).

For example: I've never run into issues benchmarking with our library. Why? Because I usually benchmark complete processes between multiple parties (say, a blind signing protocol). I just run the protocol and I serialize/deserialize whenever a party would send/receive (basically, an actual run, just without opening a socket). In this scenario, there is no need to consider secret precomputation state or artificially call computeSync() on stuff. It's all just happening as it would in an actual application. As soon as the signer gets their verification bit, that's it. As soon as the receiver's signature object is there, it's done. No issue. Why do we keep this view from people? I feel like it's one paragraph alá "you can of course benchmark your full running application with mock sockets, and then disregard everything we explain about synthetic benchmarks here" and lots of people would immediately go "ah. Okay. So then let's simply do that".

Just to give my 40 cent to this: Why are we not just showing straightforward benchmarking (like in @JanBobolz's example) and then add a section "Checking for Problems: Your numbers seem a bit off". In this section we go into the things one has to consider. So miniscule implementation details are explained there. Advantage of this approach is that the "normal" use case can concentrate on answering the question "how can I benchmark my scheme" and further details are postponed to the extra section.

rheitjoh commented 3 years ago

For example: I've never run into issues benchmarking with our library. Why? Because I usually benchmark complete processes between multiple parties (say, a blind signing protocol). I just run the protocol and I serialize/deserialize whenever a party would send/receive (basically, an actual run, just without opening a socket). In this scenario, there is no need to consider secret precomputation state or artificially call computeSync() on stuff. It's all just happening as it would in an actual application. As soon as the signer gets their verification bit, that's it. As soon as the receiver's signature object is there, it's done. No issue. Why do we keep this view from people? I feel like it's one paragraph alá "you can of course benchmark your full running application with mock sockets, and then disregard everything we explain about synthetic benchmarks here" and lots of people would immediately go "ah. Okay. So then let's simply do that".

These considerations all rely on the user being able to make that decision whether the problems are relevant to them, which they cannot do if they are not educated on what exactly the issues are. How do you even differentiate whether a benchmark is "synthetic" or not? Is any benchmark involving serialization automatically safe? Do we just say "If you are benchmarking a complete application, then these issues don't apply to you" (what is a complete application?)? I don't want to give the reader a free ticket for just ignoring our advice and (potentially) obtaining inaccurate numbers.

JanBobolz commented 3 years ago

I'm not arguing not to tell people what the issues are. I'm saying we shouldn't leave people with "these are some weird issues" and "here's how to work around it with serialization because internally, this clears caches, verification results, etc.". For me, this does not convey a good intuition of how to write my tests. I guess blindly serializing and deserializing everything will do it?! Or is that too much? When do I need to work around and when do I not?

I'm suggesting that we tell people clearly what kinds of assumptions they must not make (e.g., that .op()/.sign()/... returns only after the result is fully finished computing or that a signature object doesn't change state after verify(), etc.). But then I'd also tell them "look, this makes sense because you're not actually interested in measuring how fast sign() returns. You're interested in measuring how long it takes to get a serialized byte string signature that you can write to network. So design your benchmarks to simulate an actual use-case". And "You're not actually interested in measuring how long it takes to verify the same signature object a hundred times. You want to know how long it takes between receiving a byte string until you get the boolean whether or not that was a valid signature".

I feel like this conveys the solution to the issue clearer, i.e. it (hopefully) equips the reader with a state of mind where they can check their assumptions and design correct test cases. Or am I just thinking weirdly here?

cryptimeleon / cryptimeleon.github.io

Benchmarking paper #26