data61 / MP-SPDZ

Versatile framework for multi-party computation
Other
952 stars 279 forks source link

Custom preprocessing #1305

Closed waamm closed 1 month ago

waamm commented 9 months ago

Hello!

In order to properly benchmark the preprocessing and online phase of a certain comparison protocol for linear secret sharing scheme-based MPC protocols (see old draft here), I believe we need to customise some preprocessing code but I have not been able to find in the current documentation how to do that.

For example, it would be very useful to know how to produce in preprocessing triples ([a],[b],[ab]) where [a] and [b] are both bits in the arithmetic domain.

Judging by this line, it seems that I would need to create a new function in Fake-Offline.cpp to generate such triples. But I suspect I also need to make the compiler understand this new "data type"/"instruction"; any suggestion where to make the required edits?

Thank you for your help!

mkskeller commented 5 months ago

I don't think I can say much about code I cannot run myself. It seems you're running LTZ on an sbits element, which is not as intended.

waamm commented 5 months ago

When I copy-paste this code into non_linear.py and replace return LtzRing(c, k) by return LTS(a, 0), that's the output I get from ./compile.py -R 64 tf EzPC/Athos/Networks/SqueezeNetImgNet/graphDef.bin 1 trunc_pr. (Here I set BIT_LENGTH = EDABIT_BIT_LENGTH = 64 and HALF_PRIME = 2**63).

I am not sure why LTZ appears in the output here. If it helps, I was previously guessing that the underlying problem occurs when converting between secret edaBit bits and certain clear integers (when compiled inside the library but not for .mpc code?); print statements suggest the compiler stops at this line, and I think it's the + w3 part, hence the earlier code snippet.

mkskeller commented 5 months ago

LTZ is just the function that runs non_linear.ltz, so the problem seems to be at a higher level. I would also suggest to test the new code on smaller examples before moving on to a whole neural network.

waamm commented 5 months ago

Compiling sint(0) < sint(1) already fails (until I put the LTS code back into the .mpc file) with the same error (regardless of -R 64).

print statements suggest the compiler stops at this line, and I think it's the + w3 part, hence the earlier code snippet.

Never mind this, the compiler seems to get past that line. Maybe the problem involves "vectorisation"?

mkskeller commented 5 months ago

I don't think vectorization is the issue but the type of values. I would suggest to add debugging output and assertions to make sure all values are of the expected type.

waamm commented 5 months ago

Ah somewhere in the codebase it is probably specified that non_linear.ltz must return a sint, whereas this code was returning a sbit. Whoops.

So I replaced the last line of LTS by the following:

dabit = sint.get_dabit()
w_masked = cint((w + dabit[1]).reveal())
return dabit[0] + w_masked - 2 * dabit[0] * w_masked

Compiling tutorial yields the familiar NoneType error, until -O is added, whereas some other simple programs compile regardless (bankers_bonus, vickrey).

./compile.py -R 64 tf EzPC/Athos/Networks/SqueezeNetImgNet/graphDef.bin 1 trunc_pr now yields:

  File ".../Compiler/non_linear.py", line 241, in LTS
    w_masked = cint((w + dabit[1]).reveal())
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../Compiler/types.py", line 200, in vectorized_init
    res = function(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../Compiler/types.py", line 1077, in __init__
    super(cint, self).__init__('c', val=val, size=size)
  File ".../Compiler/types.py", line 851, in __init__
    super(_arithmetic_register, self).__init__(*args, **kwargs)
  File ".../Compiler/types.py", line 209, in instruction_typed_operation
    res = operation(self, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../Compiler/types.py", line 800, in __init__
    raise CompilerError(
Compiler.exceptions.CompilerError: cannot convert '<class 'Compiler.GC.types.bits.get_type.<locals>.bitsn'>' to '<class 'Compiler.types.cint'>'
mkskeller commented 5 months ago

You can convert from sbit to sint simply by casting: sint(sbit(0)), so there is no need to do it manually. Furthermore, which version of MP-SPDZ are you using. The following code modelled on your snippet works as expected on the latest commit:

dabit = sint.get_dabit()
print_ln('%s', cint((sbit(1) + dabit[1]).reveal()))
waamm commented 5 months ago

which version of MP-SPDZ are you using

I've been using commit e93190f from 4 June, and I don't think I've made any other significant modifications.

Replacing

dabit = sint.get_dabit()
w_masked = cint((w + dabit[1]).reveal())
return dabit[0] + w_masked - 2 * dabit[0] * w_masked

by return sint(w), everything now seems to compile. A bit surprising since I presume this "shortcut" implements roughly the same circuit?

instead of returning an edaBit, i.e. a value in Z/mZ together with sharings of its bits in Z/2Z, it would additionally return certain products of those bits.

The structure containing those products is probably best represented as a dictionary whose keys are tuples like (1,0,1). Right now I see two steps. First insert this dictionary creation function (and the one it calls) here.

Then I probably need to modify this, in particular the last section of this line so that it accepts a certain (immutable) dictionary. Is that possible? What should replace itertools.repeat('sbw')?

mkskeller commented 5 months ago

by return sint(w), everything now seems to compile. A bit surprising since I presume this "shortcut" implements roughly the same circuit?

If you provide a minimal example, I'm happy to look into it. As said before, the minimal example I came up with does work with the latest version.

instead of returning an edaBit, i.e. a value in Z/mZ together with sharings of its bits in Z/2Z, it would additionally return certain products of those bits.

The structure containing those products is probably best represented as a dictionary whose keys are tuples like (1,0,1). Right now I see two steps. First insert this dictionary creation function (and the one it calls) here.

Then I probably need to modify this, in particular the last section of this line so that it accepts a certain (immutable) dictionary. Is that possible? What should replace itertools.repeat('sbw')?

Using dictionary doesn't make sense to me. This would include the possibility that some keys are present and some are not. How is the virtual machine supposed to react to this. I would suggest that you add the virtual machine functionality in parallel, which should inform the requirements in the bytecode format.

waamm commented 5 months ago

the minimal example I came up with does work with the latest version.

For which program? As before, for your minimal example the programs bankers_bonus and vickrey compile here, but tutorial needs -O, and ./compile.py -R 64 tf... yields that odd conversion error.

This would include the possibility that some keys are present and some are not.

No, a fixed set of keys would be used. (More precisely, given a fixed natural number $\texttt{ARITY}$, this returns an array of $\lceil \ell / \texttt{ARITY} \rceil$ dictionaries, whose keys are all of the "binary" tuples of size $\texttt{ARITY}$, where $\ell$ is the bit-length. Obviously that means you can turn the entire data structure into an array/tuple, but it would become rather messy and prone to bugs.)

mkskeller commented 5 months ago

the minimal example I came up with does work with the latest version.

For which program? As before, for your minimal example the programs bankers_bonus and vickrey compile here, but tutorial needs -O, and ./compile.py -R 64 tf... yields that odd conversion error.

A minimal example reduces the problem to the issue without any extra functionality. tf includes a lot of code that is probably unrelated to the issue, which makes it hard to analyze.

This would include the possibility that some keys are present and some are not.

No, a fixed set of keys would be used. (More precisely, given a fixed natural number ARITY, this returns an array of ⌈ℓ/ARITY⌉ dictionaries, whose keys are all of the "binary" tuples of size ARITY, where ℓ is the bit-length. Obviously that means you can turn the entire data structure into an array/tuple, but it would become rather messy and prone to bugs.)

Creating the bytecode does involve some serialization in any case. Either way, working on the parsing side in parallel should give you an idea of the optimal solution.

waamm commented 5 months ago

the minimal example I came up with does work with the latest version.

A minimal example reduces the problem to the issue without any extra functionality. tf includes a lot of code that is probably unrelated to the issue, which makes it hard to analyze.

Right - before by "minimal example" you meant your LTS code snippet, but now you're also asking for a minimal program. I am not familiar with the innards of tf (there is not much code inside tf.mpc itself) so it is not easy for me to perceive right now why it would behave so differently here from bankers_bonus and vickrey.

working on the parsing side in parallel should give you an idea of the optimal solution.

Can you please point me to the relevant files/lines for this edaBit modification?

Then I probably need to modify this, in particular the last section of this line so that it accepts a certain (immutable) dictionary. Is that possible? What should replace itertools.repeat('sbw')?

More concretely, I'm hoping I can replace itertools.repeat('sbw')) by itertools.repeat({key: 'sbw' for key in gen_bitvectors(ARITY)}). (I guess once parsing is modified I will find out whether this works.)

Also, what is the difference between sb and sbw?

mkskeller commented 5 months ago

Can you please point me to the relevant files/lines for this edaBit modification?

There are several contexts:

  1. The core parsing where data from the file is read into an Instruction object: https://github.com/data61/MP-SPDZ/blob/18e934f36d4e807ce368a3cd7469c63de4f404e0/Processor/Instruction.hpp#L398
  2. The allocation where the highest register number is reported: https://github.com/data61/MP-SPDZ/blob/18e934f36d4e807ce368a3cd7469c63de4f404e0/Processor/Instruction.hpp#L678
  3. The actual exection: https://github.com/data61/MP-SPDZ/blob/18e934f36d4e807ce368a3cd7469c63de4f404e0/GC/instructions.h#L88
  4. The last one actually just refers to the edabit function of the Processor class where the r variable populated in 1. is used: https://github.com/data61/MP-SPDZ/blob/18e934f36d4e807ce368a3cd7469c63de4f404e0/Processor/Processor.hpp#L204
  5. The above in turn refers to the Preprocessing class where the start variable populated in 1. is used: https://github.com/data61/MP-SPDZ/blob/18e934f36d4e807ce368a3cd7469c63de4f404e0/Protocols/ReplicatedPrep.hpp#L1258

Also, what is the difference between sb and sbw?

The w denotes that a register is written to, so the compiler knows the usage of a specific register.

mkskeller commented 4 months ago

Version 0.3.9 adds more documentation on adding instructions: https://mp-spdz.readthedocs.io/en/latest/add-instruction.html

waamm commented 4 months ago

Thanks, that's helpful!

Where is this part connected with the EDABIT instruction?

In here it says "The length is the first argument minus one", but in arg_format the first entry is an sw so does this refers to a different list of arguments? Is that list specified somewhere?

mkskeller commented 4 months ago

Where is this part connected with the EDABIT instruction?

Here: https://github.com/data61/MP-SPDZ/blob/adef788433a416375cc934defea8cc20b4de9d10/Compiler/types.py#L2505

In here it says "The length is the first argument minus one", but in arg_format the first entry is an sw so does this refers to a different list of arguments? Is that list specified somewhere?

It is the default for variable-argument instruction to prepend the number of arguments to follow, otherwise the parser wouldn't know where the instruction arguments end. This is done here: https://github.com/data61/MP-SPDZ/blob/adef788433a416375cc934defea8cc20b4de9d10/Compiler/instructions_base.py#L954

waamm commented 4 months ago

Version 0.3.9 adds more documentation on adding instructions: https://mp-spdz.readthedocs.io/en/latest/add-instruction.html

Perhaps a comment (or a link) on how prefixsums is intended to be used might be helpful; the code

arr = sint(1, size=3)
res = sint(None, size=3)
prefixsums(res, arr)
print_ln('%s', res.reveal())

yields [1, 0, 0], whereas I was expecting [1, 2, 3]? Also, is there a reason why you chose for the @vectorise approach for inputs/outputs here as opposed to one based on Python arrays?

Where is this part connected with the EDABIT instruction?

Here:

Right, followed by this I suppose.

It is the default for variable-argument instruction to prepend the number of arguments to follow, otherwise the parser wouldn't know where the instruction arguments end. This is done here:

https://github.com/data61/MP-SPDZ/blob/adef788433a416375cc934defea8cc20b4de9d10/Compiler/instructions_base.py#L954

Doesn't the += in the subsequent line ordinarily append rather than prepend? Also, that number is 4 bytes but later an 8 byte integer is extracted here - which other 4 bytes are these?

mkskeller commented 4 months ago

Perhaps a comment (or a link) on how prefixsums is intended to be used might be helpful; the code

arr = sint(1, size=3)
res = sint(None, size=3)
prefixsums(res, arr)
print_ln('%s', res.reveal())

yields [1, 0, 0], whereas I was expecting [1, 2, 3]? Also, is there a reason why you chose for the @vectorise approach for inputs/outputs here as opposed to one based on Python arrays?

prefixsums is a vector instruction, the base version of which only works for length 1. I have update the documentation accordingly.

It is the default for variable-argument instruction to prepend the number of arguments to follow, otherwise the parser wouldn't know where the instruction arguments end. This is done here: https://github.com/data61/MP-SPDZ/blob/adef788433a416375cc934defea8cc20b4de9d10/Compiler/instructions_base.py#L954

Doesn't the += in the subsequent line ordinarily append rather than prepend? Also, that number is 4 bytes but later an 8 byte integer is extracted here - which other 4 bytes are these?

I meant prepend to all other arguments. The 8-byte integer extracted encodes the instruction type and vector length: It is generated just before: https://github.com/data61/MP-SPDZ/blob/adef788433a416375cc934defea8cc20b4de9d10/Compiler/instructions_base.py#L952

waamm commented 4 months ago

I have update the documentation accordingly.

(Small typo: here it now says vprefixsum(...) rather than vprefixsums(...). )

Is this where the actual summing happens? Edit: answer is yes.

I meant prepend to all other arguments.

Ah yes this.

There are several contexts:

Is there a standard approach here to serialise the dictionary I was suggesting earlier?

  1. The above in turn refers to the Preprocessing class where the start variable populated in 1. is used: https://github.com/data61/MP-SPDZ/blob/18e934f36d4e807ce368a3cd7469c63de4f404e0/Protocols/ReplicatedPrep.hpp#L1258

Is there a reason why this is located in the file ReplicatedPrep.hpp?

mkskeller commented 4 months ago

(Small typo: here it now says vprefixsum(...) rather than vprefixsums(...). )

Thanks, fixed.

Is there a standard approach here to serialise the dictionary I was suggesting earlier?

No. The idea is to keep the structure as simple as possible. All instruction arguments are tuples/list or concatenations thereof for parallel execution. The most complicated instruction is inputmixed because it combines integer inputs and floating-point inputs.

  1. The above in turn refers to the Preprocessing class where the start variable populated in 1. is used: https://github.com/data61/MP-SPDZ/blob/18e934f36d4e807ce368a3cd7469c63de4f404e0/Protocols/ReplicatedPrep.hpp#L1258

Is there a reason why this is located in the file ReplicatedPrep.hpp?

The replicated protocol was the first to support live preprocessing.

waamm commented 4 months ago

There are several contexts:

I haven't deciphered the code in step 5 yet, but at first glance it doesn't seem to me that steps 1-4 need any modification if I'm just making the tuple a bit longer (with more sbits)?

Let's say for simplicity that get_edabit(size) should return a sint followed by size number of sbits, as usual, and then one more sbit which is the product of the first two sbits. Where in the preprocessing code should that product be computed? (As for the rest, I suppose n_bits here should be incremented.)

mkskeller commented 4 months ago

I haven't deciphered the code in step 5 yet, but at first glance it doesn't seem to me that steps 1-4 need any modification if I'm just making the tuple a bit longer (with more sbits)?

I think so.

Let's say for simplicity that get_edabit(size) should return a sint followed by size number of sbits, as usual, and then one more sbit which is the product of the first two sbits. Where in the preprocessing code should that product be computed? (As for the rest, I suppose n_bits here should be incremented.)

You will need to run bit multiplication in C++ as in Utils/binary-example.cpp.

waamm commented 4 months ago

You will need to run bit multiplication in C++ as in Utils/binary-example.cpp.

Sure, but I meant to ask in which edaBit preprocessing files?

mkskeller commented 4 months ago

I don't think it matters which file you add your functionality but it's probably better easier to use an existing file so you don't get caught up with missing includes.

waamm commented 4 months ago

I still don't follow. I want to ResNet to run with a modified comparison protocol (which I'll probably just put in non_linear.py), and this comparison protocol uses these modified edaBits, which for simplicity I was going to retrieve with get_edabit(...).

How will executing a comparison through ResNet know how to preprocess and retrieve these modified edaBits, through a file like Utils/binary-example.cpp?

mkskeller commented 4 months ago

I didn't mean to say that the virtual machine will run any code in binary-example.cpp. Instead, the example code should inform your modifications to compute the product of two secret bits in the virtual machine. You need to do this if you want to modify or create a new instructions. For the larger picture, it might be helpful to look at the following section in the documentation, which explains how the different parts fit together: https://mp-spdz.readthedocs.io/en/latest/journey.html. More concretely, the ResNet code will involve ReLU and MaxPool layers in ml.py, which in turn calls comparison operators on sfix, which in turn will call an MPC algorithm for comparison depending on a variety of factors. This is where non_linear.py as it is crucial part of this process with the different classes for rings (modulo power of two) and primes (modulo a large prime) computation. This filters down to a strings of instructions (possibly including the ones for edaBits), which is then executed by the virtual machine (the C++ part).

waamm commented 4 months ago

This is where non_linear.py as it is crucial part of this process with the different classes for rings (modulo power of two) and primes (modulo a large prime) computation.

This part I understand - I was already able to get the modified comparison code running (with SqueezeNet), but too much of the computation was unnecessarily being done in the online phase.

Regardless, if I want get_edabit(...) to include the product of the first two sbits, computed in preprocessing along with the rest of the edaBit, I would assume that the easiest/cleanest approach is to do that computation (perhaps following the approach in binary-example.cpp) in the preprocessing files where edaBits are constructed/populated? (I did not have time today and will have a careful look at your links tomorrow.)

mkskeller commented 4 months ago

I think easiest and cleanest are not necessarily the same thing here. It's certainly easier to work on the Python level, and maybe cleaner to defer the sbit multiplication to the virtual machine. However, it might be that the resulting saving of moving to the virtual machine is not worth your effort as it's clearly more involved. That said, I cannot make this decision for you.

waamm commented 3 months ago

When get_edabit(...) is invoked in some MPC protocol, which (preprocessing) code is ultimately responsible for returning a populated edaBit? Perhaps that would be an easy place?

mkskeller commented 3 months ago

What do you mean by ultimately responsible? edabit generation is distributed across dozens of functions containing thousands of lines of code.

waamm commented 3 months ago

I meant something like this:

(i) Following the edaBits paper, in the setting of passive security an edaBit should be constructed in preprocessing after executing the protocol $\Pi{\texttt{edaBits}}$, after which the resulting edaBit is stored somewhere. One place to make the desired change (namely, multiplying some of its bits) would be just after finishing $\Pi{\texttt{edaBits}}$ and just before the edaBit is stored? (ii) In the setting of active security, before storage a cut-and-choose procedure will take place. In this case the desired change would happen afterward that, though ideally before the MPC protocol executes any (possibly batch-based) multiplication verification protocol?

In particular, I was hoping you could point me to the specific sections where such code is found?

mkskeller commented 3 months ago

MP-SPDZ the core function for preprocessing edabits is called buffer_edabits, which has a few instantiations depending on the protocol. In the case of some semi-honest protocols, the internal storage of a batch happens here: https://github.com/data61/MP-SPDZ/blob/ac31098bb5dd04ca81c863c55b67233846be5740/Protocols/ReplicatedPrep.hpp#L947 where as for malicious protocols, the internal storage happens here: https://github.com/data61/MP-SPDZ/blob/ac31098bb5dd04ca81c863c55b67233846be5740/Protocols/Spdz2kPrep.hpp#L229C1-L236C6

waamm commented 3 months ago

Following Utils/binary-example.cpp as suggested, a first attempt was this:

Insert auto& protocol = *this->protocol; here, and here the following:

protocol.init_mul();
protocol.prepare_mul(bits.at(0)[0].get_bit(0), bits.at(1)[0].get_bit(0));
protocol.exchange();
auto c = protocol.finalize_mul();
checked.back().second.push_back(c);

Then make mascot yields errors like this:

./Protocols/Spdz2kPrep.hpp:228:41: error: no viable conversion from 'Share<gfp_<0, 1>>' to 'const GC::TinierShare<gf2n_mac_key>'
        checked.back().second.push_back(c);

Some other question:

Why the + 5 here?

What are called wholes here are called sums in this file?

Is n_bits actually used here? What is its role?

What is the interpretation of bit_type::part_type::default_length? It seems to be either 1 or 64?

mkskeller commented 3 months ago

Following Utils/binary-example.cpp as suggested, a first attempt was this:

Insert auto& protocol = *this->protocol; here, and here the following:

protocol.init_mul();
protocol.prepare_mul(bits.at(0)[0].get_bit(0), bits.at(1)[0].get_bit(0));
protocol.exchange();
auto c = protocol.finalize_mul();
checked.back().second.push_back(c);

Then make mascot yields errors like this:

./Protocols/Spdz2kPrep.hpp:228:41: error: no viable conversion from 'Share<gfp_<0, 1>>' to 'const GC::TinierShare<gf2n_mac_key>'
        checked.back().second.push_back(c);

this->protocol refers to an instance for the arithmetic side. You can access an instance for the binary side with bit_proc.protocol.

Why the + 5 here?

This is to cover the overflow by adding the parties' inputs. 5 bits of overflow cover up to 31 parties, which seems a reasonable limit for this kind of protocol.

What are called wholes here are called sums in this file?

No particular reason. The wholes are the result of secret sharing summation.

Is n_bits actually used here? What is its role?

Yes. The framework allows processing up to 64 bits at once, and the parameter specifies the exact number for optimization.

What is the interpretation of bit_type::part_type::default_length? It seems to be either 1 or 64?

In some protocols, it makes sense to pack secret shares into machine words because every share is only one or two bits whereas other protocols it only makes sense to handle one bit at once. The latter is the case for protocols using MACs, which includes Tinier.

waamm commented 3 months ago

Is n_bits actually used here? What is its role?

Yes. The framework allows processing up to 64 bits at once, and the parameter specifies the exact number for optimization.

Is the aim of binary-example.cpp stated somewhere? I presume it does a binary computation in certain protocols; for say Tinier this optimisation is entirely ignored, but not for say Rep3?

Following Utils/binary-example.cpp as suggested, a first attempt was this: Insert auto& protocol = *this->protocol; here, and here the following:

protocol.init_mul();
protocol.prepare_mul(bits.at(0)[0].get_bit(0), bits.at(1)[0].get_bit(0));
protocol.exchange();
auto c = protocol.finalize_mul();
checked.back().second.push_back(c);

Then make mascot yields errors like this:

./Protocols/Spdz2kPrep.hpp:228:41: error: no viable conversion from 'Share<gfp_<0, 1>>' to 'const GC::TinierShare<gf2n_mac_key>'
        checked.back().second.push_back(c);

this->protocol refers to an instance for the arithmetic side. You can access an instance for the binary side with bit_proc.protocol.

It seems counterintuitive but I suppose this is the line I need?

protocol.prepare_mul(bits.at(0)[0].get_bit(0), bits.at(0)[1].get_bit(0))

If not, can you explain the interpretation for the three indices, in particular how to obtain the first n bits of one sum out of bits? Why does this loop run size+2 times, where size is the number of bits of the edaBit? I tried several versions and didn't get it to work; the last bit is not the product of the first two.

mkskeller commented 3 months ago

Is n_bits actually used here? What is its role?

Yes. The framework allows processing up to 64 bits at once, and the parameter specifies the exact number for optimization.

Is the aim of binary-example.cpp stated somewhere? I presume it does a binary computation in certain protocols; for say Tinier this optimisation is entirely ignored, but not for say Rep3?

binary-example.cpp computes the AND (multiplication in GF(2)) of a number of inputs from two parties. Optimization is maybe the wrong word, without the parameter the framework would compute 64 ANDs even when less fewer are actually required.

It seems counterintuitive but I suppose this is the line I need?

protocol.prepare_mul(bits.at(0)[0].get_bit(0), bits.at(0)[1].get_bit(0))

Can you give more context?

If not, can you explain the interpretation for the three indices, in particular how to obtain the first n bits of one sum out of bits?

bits is a vector of vectors of the most atomic share type, which is one bit plus MAC in the case of Tinier. In that case the outer vector corresponds to the elements of sum, and the inner vector to the bits corresponding to one element of sum. In short the first n bits of the i-th sum element are in bits[i], which is a vector of length at least n.

Why does this loop run size+2 times, where size is the number of bits of the edaBit? I tried several versions and didn't get it to work; the last bit is not the product of the first two.

The loop runs longer because the edaBits are still "loose" at this time, that is the overflow from the summation hasn't been removed yet. This happens in the sanitize function.

waamm commented 3 months ago

Can you give more context?

I meant this:

Let's say for simplicity that get_edabit(size) should return a sint followed by size number of sbits, as usual, and then one more sbit which is the product of the first two sbits. Where in the preprocessing code should that product be computed?

I think easiest and cleanest are not necessarily the same thing here. It's certainly easier to work on the Python level, and maybe cleaner to defer the sbit multiplication to the virtual machine. However, it might be that the resulting saving of moving to the virtual machine is not worth your effort as it's clearly more involved. That said, I cannot make this decision for you.

How would you have done the above in the Python level?

The loop runs longer because the edaBits are still "loose" at this time, that is the overflow from the summation hasn't been removed yet. This happens in the sanitize function.

Ah I should have placed my modifications a few lines later, precisely as you were already indicating earlier. But I now suspect that somewhere at the start of the edaBit generation process (or in the function calling it) I will have to replace n_bits by n_bits - 1 (because I had to increment it here) - do you happen to know which line(s) this is? Edit: I think this line might suffice, it seems to work.

For the next step it may be necessary to pad the edaBit with a few sbit(0)s - how does one construct those in this low-level interface? I've tried GC::TinierShare<gf2n_short>::constant(0, bit_proc.P.my_num(), bit_MC.get_alphai()); but it yields conversion errors, whereas GC::TinierShare<gf2n_mac_key>::constant(0, bit_proc.P.my_num(), bit_MC.get_alphai()); yields SECURITY BUG: insufficient checking. It seems to work when I simply remove that check, but I presume this is undesirable? Is gf2n_mac_key indeed the right field for these bits?

Do you have any idea why using ElementType = std::decay_t<decltype(x.second)::value_type> somewhere around here yields template argument for template type parameter must be a type; did you forget 'typename'??

In some of my older code cbit(1) & edabit[1][0] works, but in other code (including the current commit) it yields Compiler.exceptions.CompilerError: cannot convert cb0(1)/cb0(1) from <class 'Compiler.GC.types.cbit'> to <class 'Compiler.GC.types.bits.get_type.<locals>.bitsn'>; any idea what is causing this?

mkskeller commented 2 months ago

Let's say for simplicity that get_edabit(size) should return a sint followed by size number of sbits, as usual, and then one more sbit which is the product of the first two sbits. Where in the preprocessing code should that product be computed?

How would you have done the above in the Python level?

The product of two sbits can be computed using a & b.

For the next step it may be necessary to pad the edaBit with a few sbit(0)s - how does one construct those in this low-level interface? I've tried GC::TinierShare<gf2n_short>::constant(0, bit_proc.P.my_num(), bit_MC.get_alphai()); but it yields conversion errors, whereas GC::TinierShare<gf2n_mac_key>::constant(0, bit_proc.P.my_num(), bit_MC.get_alphai()); yields SECURITY BUG: insufficient checking. It seems to work when I simply remove that check, but I presume this is undesirable? Is gf2n_mac_key indeed the right field for these bits?

You can use the default constructor of TinierShare to get a zero sharing.

Do you have any idea why using ElementType = std::decay_t<decltype(x.second)::value_type> somewhere around here yields template argument for template type parameter must be a type; did you forget 'typename'??

I'm not familiar with this construction, maybe you just need to add typename before decltpe.

In some of my older code cbit(1) & edabit[1][0] works, but in other code (including the current commit) it yields Compiler.exceptions.CompilerError: cannot convert cb0(1)/cb0(1) from <class 'Compiler.GC.types.cbit'> to <class 'Compiler.GC.types.bits.get_type.<locals>.bitsn'>; any idea what is causing this?

Can you give a minimal example of this? The following does not produce any errors for me: cbit(1) & sint.get_edabit(10)[1][0]

waamm commented 2 months ago

Let's say for simplicity that get_edabit(size) should return a sint followed by size number of sbits, as usual, and then one more sbit which is the product of the first two sbits. Where in the preprocessing code should that product be computed?

How would you have done the above in the Python level?

The product of two sbits can be computed using a & b.

Yes, my question is how to do that in preprocessing - I think I can do it at the C++ level by modifying things e.g. here, but I'm curious how to do it at the "Python level" as you were suggesting earlier:

It's certainly easier to work on the Python level, and maybe cleaner to defer the sbit multiplication to the virtual machine. However, it might be that the resulting saving of moving to the virtual machine is not worth your effort as it's clearly more involved. That said, I cannot make this decision for you.


For the next step it may be necessary to pad the edaBit with a few sbit(0)s - how does one construct those in this low-level interface? I've tried GC::TinierShare<gf2n_short>::constant(0, bit_proc.P.my_num(), bit_MC.get_alphai()); but it yields conversion errors, whereas GC::TinierShare<gf2n_mac_key>::constant(0, bit_proc.P.my_num(), bit_MC.get_alphai()); yields SECURITY BUG: insufficient checking. It seems to work when I simply remove that check, but I presume this is undesirable? Is gf2n_mac_key indeed the right field for these bits?

You can use the default constructor of TinierShare to get a zero sharing.

Oh you mean TinierShare()? I can't get that to work atm, where in the code is it implemented?

In some of my older code cbit(1) & edabit[1][0] works, but in other code (including the current commit) it yields Compiler.exceptions.CompilerError: cannot convert cb0(1)/cb0(1) from <class 'Compiler.GC.types.cbit'> to <class 'Compiler.GC.types.bits.get_type.<locals>.bitsn'>; any idea what is causing this?

Can you give a minimal example of this? The following does not produce any errors for me: cbit(1) & sint.get_edabit(10)[1][0]

Let's say you put something like return sint(cbit(1) & sint.get_edabit(10)[1][0]) here or here and then compute a comparison between two sints inside some .mpc file?

mkskeller commented 2 months ago

Let's say for simplicity that get_edabit(size) should return a sint followed by size number of sbits, as usual, and then one more sbit which is the product of the first two sbits. Where in the preprocessing code should that product be computed?

How would you have done the above in the Python level?

The product of two sbits can be computed using a & b.

Yes, my question is how to do that in preprocessing - I think I can do it at the C++ level by modifying things e.g. here, but I'm curious how to do it at the "Python level" as you were suggesting earlier:

It's certainly easier to work on the Python level, and maybe cleaner to defer the sbit multiplication to the virtual machine. However, it might be that the resulting saving of moving to the virtual machine is not worth your effort as it's clearly more involved. That said, I cannot make this decision for you.

Maybe there's a misunderstanding here. The Python level is oblivious to preprocessing, but you can use timers to differentiate the benchmark between the two phases.

For the next step it may be necessary to pad the edaBit with a few sbit(0)s - how does one construct those in this low-level interface? I've tried GC::TinierShare<gf2n_short>::constant(0, bit_proc.P.my_num(), bit_MC.get_alphai()); but it yields conversion errors, whereas GC::TinierShare<gf2n_mac_key>::constant(0, bit_proc.P.my_num(), bit_MC.get_alphai()); yields SECURITY BUG: insufficient checking. It seems to work when I simply remove that check, but I presume this is undesirable? Is gf2n_mac_key indeed the right field for these bits?

You can use the default constructor of TinierShare to get a zero sharing.

Oh you mean TinierShare()? I can't get that to work atm, where in the code is it implemented?

This is defined in GC/TinierShare.h. What do you mean by not getting that to work?

In some of my older code cbit(1) & edabit[1][0] works, but in other code (including the current commit) it yields Compiler.exceptions.CompilerError: cannot convert cb0(1)/cb0(1) from <class 'Compiler.GC.types.cbit'> to <class 'Compiler.GC.types.bits.get_type.<locals>.bitsn'>; any idea what is causing this?

Can you give a minimal example of this? The following does not produce any errors for me: cbit(1) & sint.get_edabit(10)[1][0]

Let's say you put something like return sint(cbit(1) & sint.get_edabit(10)[1][0]) here or here and then compute a comparison between two sints inside some .mpc file?

I now can see the issue. It happens because the CISC optimization requires code to be flexible regarding the vector length, which is not the case for the AND of cbit (fixed length 1) and sbits (flexible length). You can fix this by switching off the CISC optimization (using ./compile.py -O) or using the ~ operator to invert sbits instead of AND with 1.

waamm commented 2 months ago

Let's say for simplicity that get_edabit(size) should return a sint followed by size number of sbits, as usual, and then one more sbit which is the product of the first two sbits. Where in the preprocessing code should that product be computed?

How would you have done the above in the Python level?

The product of two sbits can be computed using a & b.

Yes, my question is how to do that in preprocessing - I think I can do it at the C++ level by modifying things e.g. here, but I'm curious how to do it at the "Python level" as you were suggesting earlier:

It's certainly easier to work on the Python level, and maybe cleaner to defer the sbit multiplication to the virtual machine. However, it might be that the resulting saving of moving to the virtual machine is not worth your effort as it's clearly more involved. That said, I cannot make this decision for you.

Maybe there's a misunderstanding here. The Python level is oblivious to preprocessing, but you can use timers to differentiate the benchmark between the two phases.

The aim would be to e.g. run ResNet50 with a custom comparison protocol, whilst separating the customised preprocessing and online phases. If using tf.mpc to execute ResNet50, how should one go about overloading the default comparison protocol and also placing such timers?

For the next step it may be necessary to pad the edaBit with a few sbit(0)s - how does one construct those in this low-level interface? I've tried GC::TinierShare<gf2n_short>::constant(0, bit_proc.P.my_num(), bit_MC.get_alphai()); but it yields conversion errors, whereas GC::TinierShare<gf2n_mac_key>::constant(0, bit_proc.P.my_num(), bit_MC.get_alphai()); yields SECURITY BUG: insufficient checking. It seems to work when I simply remove that check, but I presume this is undesirable? Is gf2n_mac_key indeed the right field for these bits?

You can use the default constructor of TinierShare to get a zero sharing.

Oh you mean TinierShare()? I can't get that to work atm, where in the code is it implemented?

This is defined in GC/TinierShare.h. What do you mean by not getting that to work?

GC::TinierShare() yields no viable constructor or deduction guide for deduction of template arguments of 'TinierShare'

mkskeller commented 2 months ago

The aim would be to e.g. run ResNet50 with a custom comparison protocol, whilst separating the customised preprocessing and online phases. If using tf.mpc to execute ResNet50, how should one go about overloading the default comparison protocol and also placing such timers?

You could time the protocol using constant preprocessing and then benchmark the preprocessing separately.

GC::TinierShare() yields no viable constructor or deduction guide for deduction of template arguments of 'TinierShare'

You need to define the data type used for the MAC sharing: GC::TinierShare<gf2n_mac_key>(). Alternatively, you can use the implicit definition T::bit_type::part_type.

waamm commented 2 months ago

The aim would be to e.g. run ResNet50 with a custom comparison protocol, whilst separating the customised preprocessing and online phases. If using tf.mpc to execute ResNet50, how should one go about overloading the default comparison protocol and also placing such timers?

You could time the protocol using constant preprocessing and then benchmark the preprocessing separately.

As far as I can tell, this approach to the problem I sketched would only make sense if you can retrieve customised (and stored) preprocessing material through the Python interface? Can you please provide a sketch of what that would look like?

GC::TinierShare() yields no viable constructor or deduction guide for deduction of template arguments of 'TinierShare'

You need to define the data type used for the MAC sharing: GC::TinierShare<gf2n_mac_key>(). Alternatively, you can use the implicit definition T::bit_type::part_type.

Oh, the data type specified there for these secret-shared bits is only for the MAC sharing component? In what context would that be useful?

Anyway, I am still gettingSECURITY BUG: insufficient checking. How would you suggest one fixes this?

mkskeller commented 2 months ago

The aim would be to e.g. run ResNet50 with a custom comparison protocol, whilst separating the customised preprocessing and online phases. If using tf.mpc to execute ResNet50, how should one go about overloading the default comparison protocol and also placing such timers?

You could time the protocol using constant preprocessing and then benchmark the preprocessing separately.

As far as I can tell, this approach to the problem I sketched would only make sense if you can retrieve customised (and stored) preprocessing material through the Python interface? Can you please provide a sketch of what that would look like?

I disagree. While not perfect, using constant preprocessing can give a good approximation of the cost (as long as it's not all zero or the like, which might lead to unrealistic speed-ups.

GC::TinierShare() yields no viable constructor or deduction guide for deduction of template arguments of 'TinierShare'

You need to define the data type used for the MAC sharing: GC::TinierShare<gf2n_mac_key>(). Alternatively, you can use the implicit definition T::bit_type::part_type.

Oh, the data type specified there for these secret-shared bits is only for the MAC sharing component? In what context would that be useful?

The template parameter specifies the type of MAC. This is useful in case you want to change the number of bits in the MAC.

Anyway, I am still gettingSECURITY BUG: insufficient checking. How would you suggest one fixes this?

You can avoid this by running Check method of any MAC_Check/MC/opening instance you're using at the end.

waamm commented 2 months ago

The aim would be to e.g. run ResNet50 with a custom comparison protocol, whilst separating the customised preprocessing and online phases. If using tf.mpc to execute ResNet50, how should one go about overloading the default comparison protocol and also placing such timers?

You could time the protocol using constant preprocessing and then benchmark the preprocessing separately.

As far as I can tell, this approach to the problem I sketched would only make sense if you can retrieve customised (and stored) preprocessing material through the Python interface? Can you please provide a sketch of what that would look like?

I disagree. While not perfect, using constant preprocessing can give a good approximation of the cost (as long as it's not all zero or the like, which might lead to unrealistic speed-ups.

I don't follow, I was asking for some kind of sketch or example if you believe this approach is feasible - how does one retrieve customised preprocessing material through the Python interface?

GC::TinierShare() yields no viable constructor or deduction guide for deduction of template arguments of 'TinierShare'

You need to define the data type used for the MAC sharing: GC::TinierShare<gf2n_mac_key>(). Alternatively, you can use the implicit definition T::bit_type::part_type.

Oh, the data type specified there for these secret-shared bits is only for the MAC sharing component? In what context would that be useful?

The template parameter specifies the type of MAC. This is useful in case you want to change the number of bits in the MAC.

Right, and when would you want that number to be different from the number of bits in the non-MAC part of a TinierShare?

mkskeller commented 2 months ago

The aim would be to e.g. run ResNet50 with a custom comparison protocol, whilst separating the customised preprocessing and online phases. If using tf.mpc to execute ResNet50, how should one go about overloading the default comparison protocol and also placing such timers?

You could time the protocol using constant preprocessing and then benchmark the preprocessing separately.

As far as I can tell, this approach to the problem I sketched would only make sense if you can retrieve customised (and stored) preprocessing material through the Python interface? Can you please provide a sketch of what that would look like?

I disagree. While not perfect, using constant preprocessing can give a good approximation of the cost (as long as it's not all zero or the like, which might lead to unrealistic speed-ups.

I don't follow, I was asking for some kind of sketch or example if you believe this approach is feasible - how does one retrieve customised preprocessing material through the Python interface?

I meant to avoid the need to for retrieving customised preprocessing material through the Python interface. Instead, I would suggest to benchmark two things:

  1. The computation based on some "fake" preprocessing, that is, define a fixed instance of the preprocessing instance and use it over and over. This gives you rough benchmark of the online phase.
  2. Generating a batch of true instances of the preprocessing. This gives you a benchmark for the offline phase.

GC::TinierShare() yields no viable constructor or deduction guide for deduction of template arguments of 'TinierShare'

You need to define the data type used for the MAC sharing: GC::TinierShare<gf2n_mac_key>(). Alternatively, you can use the implicit definition T::bit_type::part_type.

Oh, the data type specified there for these secret-shared bits is only for the MAC sharing component? In what context would that be useful?

The template parameter specifies the type of MAC. This is useful in case you want to change the number of bits in the MAC.

Right, and when would you want that number to be different from the number of bits in the non-MAC part of a TinierShare?

TinierShare always contains just one bit in the non-MAC part but the has to be much longer to achieve reasonable security. 40 bits is the minimum accepted by the community in my view.