Introduce "buckets" to DebugGroup counting

JanBobolz commented 3 years ago

From experience with counting our uacs implementation, I want to propose the following:

We introduce the notion of "buckets" into DebugGroups. From the user's perspective, the flow should look roughly like

group.setBucket("user");
doStuff();
group.setBucket("provider");
doOtherStuff();

group.print("user"); //prints count of operations that happened in doStuff()
group.print("provider"); //prints count of operations that happened in doOtherStuff()

This would allow the user to effectively count operations done in interactive protocols.

More specifically/additionally:

Counting becomes a static thing. I ran into the issue that I had two copies of DebugGroup (one was created during deserialization of an internal instance of PSSignatureScheme), so the count was split between two objects (I only noticed because the count seemed somewhat off). Ideally, this whole thing becomes a bit more robust w.r.t. serialization.
My quick thought: A bucket is (an object with) a map Group -> (numSquarings, numExps, numPairings, ...), i.e. a bucket counts the number of operations per Group. A bucket is stored statically in DebugGroup (probably: static Map<String, Bucket> as a name -> bucket map). This takes care of the static counting thing from the previous bullet point "for free".
To implement the two ways of counting, either we store two buckets for each bucket name (like type1bucket: Group -> Bucket and type1bucket: Group -> Bucket become static members of DebugGroup) or each bucket contains the two types of count.

Additional suggestions for improvement:

The count output is somewhat difficult to read right now. I think it would be cool if we could have a "at a glance" tree-like structure:
```
BucketName
Pairings: 6
G1
    (Costly) Operations: 400
        Mul: 300 (290 of which happened during (multi-)exp)
        Square: 100 (100 of which happened during (multi-)exp)
    Inversions: 170 (168 of which happened during (multi-)exp)
    Exponentiations: 5
    Multiexponentiations (number of terms): [2, 2, 5]
    getRepresentation() calls: 2
G2
    ...
```
This also accounts for numbers people will probably want to put into their papers. For example, often, we don't have to space to differentiate op and square, so we just show the sum of them. Inversions are basically free, so we'd just disregard them usually.

rheitjoh commented 3 years ago

Good idea! I agree with your implementation suggestions. Some additional technical considerations:

We will some kind of default bucket that is used whenever the user does not want to explicitly specify a name. Basically would be used whenever no distinction between different protocol parties or so is necessary. I would store that one separately from the hash map used for the named ones, (to avoid having to choose an internal name that might conflict with a name chosen by the user). I would then have setBucket() and print() methods (without arguments) that enable the default bucket and print its results, respectively.

We also need to consider the (multi-)exponentiation algorithm selection features. These are specific to the LazyGroup instances and are not static and do not get serialized. For nicer interaction with the serialization it would probably be better if the chosen algorithms and window size options got serialized as well. This would increase the LazyGroup serialization size, however, which is not great. We could also store the selected algorithm and window size inside the bucket; then we can adjust them whenever the bucket is changed. This latter approach interacts with the bucket stuff the best, I think.

Also, do we want all the buckets to be serialized together with the DebugGroup? In your Benchmark this is not necessary since the static variable is shared across all instances of the class in your JVM, but whenever the serialization is actually sent to another system, the static variable won't be present. I could see this coming up in some kind of integration test where you are counting operations and testing the networking aspects at the same time. I think serializing the buckets has some advantages: Imagine you have such a protocol. Then you are sending the DebugGroup between parties, and at the end the last party has all the combined results locally present (you don't need to combine the results from each party). You do have to consider what happens if a bucket is already present when serializing a DebugGroup instance (e.g. the local party has bucket "user" and the deserialized instance has bucket "user"). I think the most sensible approach here would be to merge the two buckets, i.e. add up all the numbers. Then we get the desired effect of adding up all the numbers at the last party. Only problem maybe is the multi-exponentiation algorithm stuff if we store those in the bucket as well (how do you select between those? I would probably give precedence to the locally selected algorithm).

JanBobolz commented 3 years ago

Default bucket: very good idea. I'd say that one should probably always count (even when a bucket is selected manually), so that we always have a "total" count.

Storing the window sizes in the bucket feels strange. It's kind of cheating to change window sizes between invocations of stuff if the "real" (non-counting) code doesn't do that. I agree with the general issue, though. But this may be an unrelated issue. Changing window sizes for "normal" LazyGroups has the same issues if serialization is involved. Maybe that's another issue.

Bucket serialization: I'm unsure. For example: if I serialize my group and deserialize it on the same machine, the counts would double?!. Feels dangerous and hard to convey the meaning for this. I'd say we keep the semantics to "everything that happens on this machine in DebugGroups will be counted".

Oh, one more thing: I think it would be cool, if we could have constructors for the DebugGroup that takes a BigInteger size. That would make it easier to model, for example, Mcl group's size. I had to look up how the "security parameter" is interpreted.

rheitjoh commented 3 years ago

Default bucket: very good idea. I'd say that one should probably always count (even when a bucket is selected manually), so that we always have a "total" count.

Interesting. My initial thought would be that the "total" one should be something separate from the default one. The reason being that then you can use the default bucket just like the other ones, i.e. for your protocol benchmark you might use the default bucket for the user and a named one for the provider. In terms of functionality it doesn't really matter, but I wonder which way is more intuitive: Separating them seems maybe easier to explain? E.g. "There is a default unnamed bucket which counts operations. You can switch to separate buckets using a unique name. Oh and there are also methods that automatically sum up all the counts across the buckets for you" vs "There is a default unnamed bucket which counts operations. You can switch to separate buckets using a unique name. The default bucket also includes the counts of operations across all named buckets.". The former seems a bit more intuitive to me since it reduces the responsibility of the default bucket, i.e. it is just another bucket, just without a name. For the latter you always need to remember that it counts total operations too (additional complexity).

Bucket serialization: I'm unsure. For example: if I serialize my group and deserialize it on the same machine, the counts would double?!. Feels dangerous and hard to convey the meaning for this. I'd say we keep the semantics to "everything that happens on this machine in DebugGroups will be counted".

Right. I didn't consider that. I agree.

rheitjoh commented 3 years ago

I just noticed that having the default bucket be unnamed provides some issues with the printing and getter methods for the results. Its best if those take in the name of the bucket which you want to get the data from. Hard to do that for an unnamed bucket (and having separate getters for the default bucket results in too many methods). So I'm thinking I'll just call that bucket "default" and have it act as a normal bucket.

JanBobolz commented 3 years ago

I'd like a "total" bucket that is always actively counting. Maybe that's in addition to the "default" bucket.

My thinking was that if you never call setBucket(), "total" or "default" is just the same thing. If you do call setBucket (or whatever it's called), then in the end the output would just contain an additional "total" entry (in addition to your self-defined buckets). And that may be useful.

JanBobolz commented 3 years ago

In the "total bucket is the default bucket" world, it's probably even easier to explain (because the output explains it clearly by labelling the bucket "total"). It may be slightly less natural to have a "default" bucket output?!

cryptimeleon / math

Introduce "buckets" to DebugGroup counting #123