Closed ErikP0 closed 3 years ago
I don't know for sure without seeing the details, but I'm not surprised that this is the case. Using bit vectors makes it more efficient to organise the memory for example because the garbled circuit label are stored in C++ vectors. In the first case, the vectors will all be of length 1, so there is a need for more of them and they will be more scattered in memory whereas in the second case, all labels used in parallel are stored in one C++ vector.
Thank you. This makes sense. Is there a reason why the wire labels are structured like this in the virtual machine? Could the compiler when it is merging the instructions of the same dependency level/round also merge bit vectors together?
May I ask you if there is a difference between using sbits
and sbitvec
then? Is sbitsvec
just a compile-time utility to group sbits
? Can I expect a speed-up here?
Is there a reason why the wire labels are structured like this in the virtual machine? Could the compiler when it is merging the instructions of the same dependency level/round also merge bit vectors together?
It wouldn't be impossible but much involved because you would have to find out whether the merged instructions actually are truly parallel or if reordering would be necessary. What actually happens in MP-SPDZ is trying to catch parallelism at a higher level when using -C, and the creating parallel instructions from there.
May I ask you if there is a difference between using
sbits
andsbitvec
then? Issbitsvec
just a compile-time utility to groupsbits
? Can I expect a speed-up here?
No, it's indeed just a utility, which comes in handy when using sbitintvec
, which allow easy computationg of parallel integer operations.
I see. Thanks again :+1:
Hello, I'm currently implementing the parallel encryption of N blocks with a block cipher using garbled circuits which is supposed to run with the yao virtual machine.
sbit
type (and makes use of AND, XOR, etc.). The encryption function is implemented for a single block and then called in a loop for all N blocks.sbits.get_type(N)
as a bit with AND, XOR etc. The idea here is that a bit of the i-th block is the i-th bit in thesbits
type. The encryption function is then naturally called only once with the larger register as argument.If I understand the compiler correctly, both implementations entail the same number of AND, XOR etc. gates, since the operation
sbits.get_type(N) AND sbits.get_type(N)
results in N AND gates; the same of course for N parallel invocations ofsbit AND sbit
.What I observe in my experiments is that for large N, say N >= 100, the bit-sliced implementation is consistently much faster approx. factor 10 than the first implementation. I'm wondering why?