sbits-based implementation is much faster than sbit-based one

ErikP0 commented 3 years ago

Hello, I'm currently implementing the parallel encryption of N blocks with a block cipher using garbled circuits which is supposed to run with the yao virtual machine.

There is one implementation that uses the sbit type (and makes use of AND, XOR, etc.). The encryption function is implemented for a single block and then called in a loop for all N blocks.
I also have a "bit-sliced" implementation that uses sbits.get_type(N) as a bit with AND, XOR etc. The idea here is that a bit of the i-th block is the i-th bit in the sbits type. The encryption function is then naturally called only once with the larger register as argument.

If I understand the compiler correctly, both implementations entail the same number of AND, XOR etc. gates, since the operation sbits.get_type(N) AND sbits.get_type(N) results in N AND gates; the same of course for N parallel invocations of sbit AND sbit.

What I observe in my experiments is that for large N, say N >= 100, the bit-sliced implementation is consistently much faster approx. factor 10 than the first implementation. I'm wondering why?

mkskeller commented 3 years ago

I don't know for sure without seeing the details, but I'm not surprised that this is the case. Using bit vectors makes it more efficient to organise the memory for example because the garbled circuit label are stored in C++ vectors. In the first case, the vectors will all be of length 1, so there is a need for more of them and they will be more scattered in memory whereas in the second case, all labels used in parallel are stored in one C++ vector.

ErikP0 commented 3 years ago

Thank you. This makes sense. Is there a reason why the wire labels are structured like this in the virtual machine? Could the compiler when it is merging the instructions of the same dependency level/round also merge bit vectors together?

May I ask you if there is a difference between using sbits and sbitvec then? Is sbitsvec just a compile-time utility to group sbits? Can I expect a speed-up here?

mkskeller commented 3 years ago

Is there a reason why the wire labels are structured like this in the virtual machine? Could the compiler when it is merging the instructions of the same dependency level/round also merge bit vectors together?

It wouldn't be impossible but much involved because you would have to find out whether the merged instructions actually are truly parallel or if reordering would be necessary. What actually happens in MP-SPDZ is trying to catch parallelism at a higher level when using -C, and the creating parallel instructions from there.

May I ask you if there is a difference between using sbits and sbitvec then? Is sbitsvec just a compile-time utility to group sbits? Can I expect a speed-up here?

No, it's indeed just a utility, which comes in handy when using sbitintvec, which allow easy computationg of parallel integer operations.

ErikP0 commented 3 years ago

I see. Thanks again :+1:

data61 / MP-SPDZ

sbits-based implementation is much faster than sbit-based one #330