Closed waamm closed 1 month ago
I don't think I can say much about code I cannot run myself. It seems you're running LTZ on an sbits
element, which is not as intended.
When I copy-paste this code into non_linear.py
and replace return LtzRing(c, k)
by return LTS(a, 0)
, that's the output I get from ./compile.py -R 64 tf EzPC/Athos/Networks/SqueezeNetImgNet/graphDef.bin 1 trunc_pr
. (Here I set BIT_LENGTH = EDABIT_BIT_LENGTH = 64
and HALF_PRIME = 2**63
).
I am not sure why LTZ
appears in the output here. If it helps, I was previously guessing that the underlying problem occurs when converting between secret edaBit bits and certain clear integers (when compiled inside the library but not for .mpc code?); print statements suggest the compiler stops at this line, and I think it's the + w3
part, hence the earlier code snippet.
LTZ is just the function that runs non_linear.ltz
, so the problem seems to be at a higher level. I would also suggest to test the new code on smaller examples before moving on to a whole neural network.
Compiling sint(0) < sint(1)
already fails (until I put the LTS code back into the .mpc
file) with the same error (regardless of -R 64
).
print statements suggest the compiler stops at this line, and I think it's the + w3 part, hence the earlier code snippet.
Never mind this, the compiler seems to get past that line. Maybe the problem involves "vectorisation"?
I don't think vectorization is the issue but the type of values. I would suggest to add debugging output and assertions to make sure all values are of the expected type.
Ah somewhere in the codebase it is probably specified that non_linear.ltz
must return a sint
, whereas this code was returning a sbit
. Whoops.
So I replaced the last line of LTS
by the following:
dabit = sint.get_dabit()
w_masked = cint((w + dabit[1]).reveal())
return dabit[0] + w_masked - 2 * dabit[0] * w_masked
Compiling tutorial
yields the familiar NoneType
error, until -O
is added, whereas some other simple programs compile regardless (bankers_bonus
, vickrey
).
./compile.py -R 64 tf EzPC/Athos/Networks/SqueezeNetImgNet/graphDef.bin 1 trunc_pr
now yields:
File ".../Compiler/non_linear.py", line 241, in LTS
w_masked = cint((w + dabit[1]).reveal())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../Compiler/types.py", line 200, in vectorized_init
res = function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../Compiler/types.py", line 1077, in __init__
super(cint, self).__init__('c', val=val, size=size)
File ".../Compiler/types.py", line 851, in __init__
super(_arithmetic_register, self).__init__(*args, **kwargs)
File ".../Compiler/types.py", line 209, in instruction_typed_operation
res = operation(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../Compiler/types.py", line 800, in __init__
raise CompilerError(
Compiler.exceptions.CompilerError: cannot convert '<class 'Compiler.GC.types.bits.get_type.<locals>.bitsn'>' to '<class 'Compiler.types.cint'>'
You can convert from sbit
to sint
simply by casting: sint(sbit(0))
, so there is no need to do it manually. Furthermore, which version of MP-SPDZ are you using. The following code modelled on your snippet works as expected on the latest commit:
dabit = sint.get_dabit()
print_ln('%s', cint((sbit(1) + dabit[1]).reveal()))
which version of MP-SPDZ are you using
I've been using commit e93190f from 4 June, and I don't think I've made any other significant modifications.
Replacing
dabit = sint.get_dabit()
w_masked = cint((w + dabit[1]).reveal())
return dabit[0] + w_masked - 2 * dabit[0] * w_masked
by return sint(w)
, everything now seems to compile. A bit surprising since I presume this "shortcut" implements roughly the same circuit?
instead of returning an edaBit, i.e. a value in Z/mZ together with sharings of its bits in Z/2Z, it would additionally return certain products of those bits.
The structure containing those products is probably best represented as a dictionary whose keys are tuples like (1,0,1)
. Right now I see two steps. First insert this dictionary creation function (and the one it calls) here.
Then I probably need to modify this, in particular the last section of this line so that it accepts a certain (immutable) dictionary. Is that possible? What should replace itertools.repeat('sbw')
?
by
return sint(w)
, everything now seems to compile. A bit surprising since I presume this "shortcut" implements roughly the same circuit?
If you provide a minimal example, I'm happy to look into it. As said before, the minimal example I came up with does work with the latest version.
instead of returning an edaBit, i.e. a value in Z/mZ together with sharings of its bits in Z/2Z, it would additionally return certain products of those bits.
The structure containing those products is probably best represented as a dictionary whose keys are tuples like
(1,0,1)
. Right now I see two steps. First insert this dictionary creation function (and the one it calls) here.Then I probably need to modify this, in particular the last section of this line so that it accepts a certain (immutable) dictionary. Is that possible? What should replace
itertools.repeat('sbw')
?
Using dictionary doesn't make sense to me. This would include the possibility that some keys are present and some are not. How is the virtual machine supposed to react to this. I would suggest that you add the virtual machine functionality in parallel, which should inform the requirements in the bytecode format.
the minimal example I came up with does work with the latest version.
For which program? As before, for your minimal example the programs bankers_bonus
and vickrey
compile here, but tutorial
needs -O
, and ./compile.py -R 64 tf...
yields that odd conversion error.
This would include the possibility that some keys are present and some are not.
No, a fixed set of keys would be used. (More precisely, given a fixed natural number $\texttt{ARITY}$, this returns an array of $\lceil \ell / \texttt{ARITY} \rceil$ dictionaries, whose keys are all of the "binary" tuples of size $\texttt{ARITY}$, where $\ell$ is the bit-length. Obviously that means you can turn the entire data structure into an array/tuple, but it would become rather messy and prone to bugs.)
the minimal example I came up with does work with the latest version.
For which program? As before, for your minimal example the programs
bankers_bonus
andvickrey
compile here, buttutorial
needs-O
, and./compile.py -R 64 tf...
yields that odd conversion error.
A minimal example reduces the problem to the issue without any extra functionality. tf
includes a lot of code that is probably unrelated to the issue, which makes it hard to analyze.
This would include the possibility that some keys are present and some are not.
No, a fixed set of keys would be used. (More precisely, given a fixed natural number ARITY, this returns an array of ⌈ℓ/ARITY⌉ dictionaries, whose keys are all of the "binary" tuples of size ARITY, where ℓ is the bit-length. Obviously that means you can turn the entire data structure into an array/tuple, but it would become rather messy and prone to bugs.)
Creating the bytecode does involve some serialization in any case. Either way, working on the parsing side in parallel should give you an idea of the optimal solution.
the minimal example I came up with does work with the latest version.
A minimal example reduces the problem to the issue without any extra functionality.
tf
includes a lot of code that is probably unrelated to the issue, which makes it hard to analyze.
Right - before by "minimal example" you meant your LTS code snippet, but now you're also asking for a minimal program. I am not familiar with the innards of tf
(there is not much code inside tf.mpc
itself) so it is not easy for me to perceive right now why it would behave so differently here from bankers_bonus
and vickrey
.
working on the parsing side in parallel should give you an idea of the optimal solution.
Can you please point me to the relevant files/lines for this edaBit modification?
Then I probably need to modify this, in particular the last section of this line so that it accepts a certain (immutable) dictionary. Is that possible? What should replace
itertools.repeat('sbw')
?
More concretely, I'm hoping I can replace itertools.repeat('sbw'))
by itertools.repeat({key: 'sbw' for key in gen_bitvectors(ARITY)})
. (I guess once parsing is modified I will find out whether this works.)
Also, what is the difference between sb
and sbw
?
Can you please point me to the relevant files/lines for this edaBit modification?
There are several contexts:
Also, what is the difference between
sb
andsbw
?
The w denotes that a register is written to, so the compiler knows the usage of a specific register.
Version 0.3.9 adds more documentation on adding instructions: https://mp-spdz.readthedocs.io/en/latest/add-instruction.html
Where is this part connected with the
EDABIT
instruction?
In here it says "The length is the first argument minus one", but in
arg_format
the first entry is answ
so does this refers to a different list of arguments? Is that list specified somewhere?
It is the default for variable-argument instruction to prepend the number of arguments to follow, otherwise the parser wouldn't know where the instruction arguments end. This is done here: https://github.com/data61/MP-SPDZ/blob/adef788433a416375cc934defea8cc20b4de9d10/Compiler/instructions_base.py#L954
Version 0.3.9 adds more documentation on adding instructions: https://mp-spdz.readthedocs.io/en/latest/add-instruction.html
Perhaps a comment (or a link) on how prefixsums
is intended to be used might be helpful; the code
arr = sint(1, size=3)
res = sint(None, size=3)
prefixsums(res, arr)
print_ln('%s', res.reveal())
yields [1, 0, 0]
, whereas I was expecting [1, 2, 3]
? Also, is there a reason why you chose for the @vectorise
approach for inputs/outputs here as opposed to one based on Python arrays?
Where is this part connected with the
EDABIT
instruction?Here:
Right, followed by this I suppose.
It is the default for variable-argument instruction to prepend the number of arguments to follow, otherwise the parser wouldn't know where the instruction arguments end. This is done here:
Doesn't the +=
in the subsequent line ordinarily append rather than prepend? Also, that number is 4 bytes but later an 8 byte integer is extracted here - which other 4 bytes are these?
Perhaps a comment (or a link) on how
prefixsums
is intended to be used might be helpful; the codearr = sint(1, size=3) res = sint(None, size=3) prefixsums(res, arr) print_ln('%s', res.reveal())
yields
[1, 0, 0]
, whereas I was expecting[1, 2, 3]
? Also, is there a reason why you chose for the@vectorise
approach for inputs/outputs here as opposed to one based on Python arrays?
prefixsums
is a vector instruction, the base version of which only works for length 1. I have update the documentation accordingly.
It is the default for variable-argument instruction to prepend the number of arguments to follow, otherwise the parser wouldn't know where the instruction arguments end. This is done here: https://github.com/data61/MP-SPDZ/blob/adef788433a416375cc934defea8cc20b4de9d10/Compiler/instructions_base.py#L954
Doesn't the
+=
in the subsequent line ordinarily append rather than prepend? Also, that number is 4 bytes but later an 8 byte integer is extracted here - which other 4 bytes are these?
I meant prepend to all other arguments. The 8-byte integer extracted encodes the instruction type and vector length: It is generated just before: https://github.com/data61/MP-SPDZ/blob/adef788433a416375cc934defea8cc20b4de9d10/Compiler/instructions_base.py#L952
I have update the documentation accordingly.
(Small typo: here it now says vprefixsum(...)
rather than vprefixsums(...)
. )
Is this where the actual summing happens? Edit: answer is yes.
I meant prepend to all other arguments.
Ah yes this.
There are several contexts:
Is there a standard approach here to serialise the dictionary I was suggesting earlier?
- The above in turn refers to the Preprocessing class where the start variable populated in 1. is used: https://github.com/data61/MP-SPDZ/blob/18e934f36d4e807ce368a3cd7469c63de4f404e0/Protocols/ReplicatedPrep.hpp#L1258
Is there a reason why this is located in the file ReplicatedPrep.hpp
?
(Small typo: here it now says
vprefixsum(...)
rather thanvprefixsums(...)
. )
Thanks, fixed.
Is there a standard approach here to serialise the dictionary I was suggesting earlier?
No. The idea is to keep the structure as simple as possible. All instruction arguments are tuples/list or concatenations thereof for parallel execution. The most complicated instruction is inputmixed because it combines integer inputs and floating-point inputs.
- The above in turn refers to the Preprocessing class where the start variable populated in 1. is used: https://github.com/data61/MP-SPDZ/blob/18e934f36d4e807ce368a3cd7469c63de4f404e0/Protocols/ReplicatedPrep.hpp#L1258
Is there a reason why this is located in the file
ReplicatedPrep.hpp
?
The replicated protocol was the first to support live preprocessing.
There are several contexts:
I haven't deciphered the code in step 5 yet, but at first glance it doesn't seem to me that steps 1-4 need any modification if I'm just making the tuple a bit longer (with more sbit
s)?
Let's say for simplicity that get_edabit(size)
should return a sint
followed by size
number of sbit
s, as usual, and then one more sbit
which is the product of the first two sbit
s. Where in the preprocessing code should that product be computed? (As for the rest, I suppose n_bits
here should be incremented.)
I haven't deciphered the code in step 5 yet, but at first glance it doesn't seem to me that steps 1-4 need any modification if I'm just making the tuple a bit longer (with more
sbit
s)?
I think so.
Let's say for simplicity that
get_edabit(size)
should return asint
followed bysize
number ofsbit
s, as usual, and then one moresbit
which is the product of the first twosbit
s. Where in the preprocessing code should that product be computed? (As for the rest, I supposen_bits
here should be incremented.)
You will need to run bit multiplication in C++ as in Utils/binary-example.cpp
.
You will need to run bit multiplication in C++ as in
Utils/binary-example.cpp
.
Sure, but I meant to ask in which edaBit preprocessing files?
I don't think it matters which file you add your functionality but it's probably better easier to use an existing file so you don't get caught up with missing includes.
I still don't follow. I want to ResNet to run with a modified comparison protocol (which I'll probably just put in non_linear.py
), and this comparison protocol uses these modified edaBits, which for simplicity I was going to retrieve with get_edabit(...)
.
How will executing a comparison through ResNet know how to preprocess and retrieve these modified edaBits, through a file like Utils/binary-example.cpp
?
I didn't mean to say that the virtual machine will run any code in binary-example.cpp
. Instead, the example code should inform your modifications to compute the product of two secret bits in the virtual machine. You need to do this if you want to modify or create a new instructions. For the larger picture, it might be helpful to look at the following section in the documentation, which explains how the different parts fit together: https://mp-spdz.readthedocs.io/en/latest/journey.html. More concretely, the ResNet code will involve ReLU and MaxPool layers in ml.py
, which in turn calls comparison operators on sfix, which in turn will call an MPC algorithm for comparison depending on a variety of factors. This is where non_linear.py
as it is crucial part of this process with the different classes for rings (modulo power of two) and primes (modulo a large prime) computation. This filters down to a strings of instructions (possibly including the ones for edaBits), which is then executed by the virtual machine (the C++ part).
This is where
non_linear.py
as it is crucial part of this process with the different classes for rings (modulo power of two) and primes (modulo a large prime) computation.
This part I understand - I was already able to get the modified comparison code running (with SqueezeNet), but too much of the computation was unnecessarily being done in the online phase.
Regardless, if I want get_edabit(...)
to include the product of the first two sbit
s, computed in preprocessing along with the rest of the edaBit, I would assume that the easiest/cleanest approach is to do that computation (perhaps following the approach in binary-example.cpp
) in the preprocessing files where edaBits are constructed/populated? (I did not have time today and will have a careful look at your links tomorrow.)
I think easiest and cleanest are not necessarily the same thing here. It's certainly easier to work on the Python level, and maybe cleaner to defer the sbit multiplication to the virtual machine. However, it might be that the resulting saving of moving to the virtual machine is not worth your effort as it's clearly more involved. That said, I cannot make this decision for you.
When get_edabit(...)
is invoked in some MPC protocol, which (preprocessing) code is ultimately responsible for returning a populated edaBit? Perhaps that would be an easy place?
What do you mean by ultimately responsible? edabit generation is distributed across dozens of functions containing thousands of lines of code.
I meant something like this:
(i) Following the edaBits paper, in the setting of passive security an edaBit should be constructed in preprocessing after executing the protocol $\Pi{\texttt{edaBits}}$, after which the resulting edaBit is stored somewhere. One place to make the desired change (namely, multiplying some of its bits) would be just after finishing $\Pi{\texttt{edaBits}}$ and just before the edaBit is stored? (ii) In the setting of active security, before storage a cut-and-choose procedure will take place. In this case the desired change would happen afterward that, though ideally before the MPC protocol executes any (possibly batch-based) multiplication verification protocol?
In particular, I was hoping you could point me to the specific sections where such code is found?
MP-SPDZ the core function for preprocessing edabits is called buffer_edabits
, which has a few instantiations depending on the protocol. In the case of some semi-honest protocols, the internal storage of a batch happens here: https://github.com/data61/MP-SPDZ/blob/ac31098bb5dd04ca81c863c55b67233846be5740/Protocols/ReplicatedPrep.hpp#L947
where as for malicious protocols, the internal storage happens here: https://github.com/data61/MP-SPDZ/blob/ac31098bb5dd04ca81c863c55b67233846be5740/Protocols/Spdz2kPrep.hpp#L229C1-L236C6
Following Utils/binary-example.cpp
as suggested, a first attempt was this:
Insert auto& protocol = *this->protocol;
here, and here the following:
protocol.init_mul();
protocol.prepare_mul(bits.at(0)[0].get_bit(0), bits.at(1)[0].get_bit(0));
protocol.exchange();
auto c = protocol.finalize_mul();
checked.back().second.push_back(c);
Then make mascot
yields errors like this:
./Protocols/Spdz2kPrep.hpp:228:41: error: no viable conversion from 'Share<gfp_<0, 1>>' to 'const GC::TinierShare<gf2n_mac_key>'
checked.back().second.push_back(c);
Some other question:
Why the + 5
here?
What are called wholes here are called sums in this file?
Is n_bits
actually used here? What is its role?
What is the interpretation of bit_type::part_type::default_length
? It seems to be either 1 or 64?
Following
Utils/binary-example.cpp
as suggested, a first attempt was this:Insert
auto& protocol = *this->protocol;
here, and here the following:protocol.init_mul(); protocol.prepare_mul(bits.at(0)[0].get_bit(0), bits.at(1)[0].get_bit(0)); protocol.exchange(); auto c = protocol.finalize_mul(); checked.back().second.push_back(c);
Then
make mascot
yields errors like this:./Protocols/Spdz2kPrep.hpp:228:41: error: no viable conversion from 'Share<gfp_<0, 1>>' to 'const GC::TinierShare<gf2n_mac_key>' checked.back().second.push_back(c);
this->protocol
refers to an instance for the arithmetic side. You can access an instance for the binary side with bit_proc.protocol
.
Why the
+ 5
here?
This is to cover the overflow by adding the parties' inputs. 5 bits of overflow cover up to 31 parties, which seems a reasonable limit for this kind of protocol.
No particular reason. The wholes are the result of secret sharing summation.
Is
n_bits
actually used here? What is its role?
Yes. The framework allows processing up to 64 bits at once, and the parameter specifies the exact number for optimization.
What is the interpretation of
bit_type::part_type::default_length
? It seems to be either 1 or 64?
In some protocols, it makes sense to pack secret shares into machine words because every share is only one or two bits whereas other protocols it only makes sense to handle one bit at once. The latter is the case for protocols using MACs, which includes Tinier.
Is
n_bits
actually used here? What is its role?Yes. The framework allows processing up to 64 bits at once, and the parameter specifies the exact number for optimization.
Is the aim of binary-example.cpp
stated somewhere? I presume it does a binary computation in certain protocols; for say Tinier
this optimisation is entirely ignored, but not for say Rep3
?
Following
Utils/binary-example.cpp
as suggested, a first attempt was this: Insertauto& protocol = *this->protocol;
here, and here the following:protocol.init_mul(); protocol.prepare_mul(bits.at(0)[0].get_bit(0), bits.at(1)[0].get_bit(0)); protocol.exchange(); auto c = protocol.finalize_mul(); checked.back().second.push_back(c);
Then
make mascot
yields errors like this:./Protocols/Spdz2kPrep.hpp:228:41: error: no viable conversion from 'Share<gfp_<0, 1>>' to 'const GC::TinierShare<gf2n_mac_key>' checked.back().second.push_back(c);
this->protocol
refers to an instance for the arithmetic side. You can access an instance for the binary side withbit_proc.protocol
.
It seems counterintuitive but I suppose this is the line I need?
protocol.prepare_mul(bits.at(0)[0].get_bit(0), bits.at(0)[1].get_bit(0))
If not, can you explain the interpretation for the three indices, in particular how to obtain the first n bits of one sum
out of bits
? Why does this loop run size+2 times
, where size
is the number of bits of the edaBit? I tried several versions and didn't get it to work; the last bit is not the product of the first two.
Is
n_bits
actually used here? What is its role?Yes. The framework allows processing up to 64 bits at once, and the parameter specifies the exact number for optimization.
Is the aim of
binary-example.cpp
stated somewhere? I presume it does a binary computation in certain protocols; for sayTinier
this optimisation is entirely ignored, but not for sayRep3
?
binary-example.cpp
computes the AND (multiplication in GF(2)) of a number of inputs from two parties. Optimization is maybe the wrong word, without the parameter the framework would compute 64 ANDs even when less fewer are actually required.
It seems counterintuitive but I suppose this is the line I need?
protocol.prepare_mul(bits.at(0)[0].get_bit(0), bits.at(0)[1].get_bit(0))
Can you give more context?
If not, can you explain the interpretation for the three indices, in particular how to obtain the first n bits of one
sum
out ofbits
?
bits
is a vector of vectors of the most atomic share type, which is one bit plus MAC in the case of Tinier. In that case the outer vector corresponds to the elements of sum
, and the inner vector to the bits corresponding to one element of sum
. In short the first n bits of the i-th sum
element are in bits[i]
, which is a vector of length at least n.
Why does this loop run
size+2 times
, wheresize
is the number of bits of the edaBit? I tried several versions and didn't get it to work; the last bit is not the product of the first two.
The loop runs longer because the edaBits are still "loose" at this time, that is the overflow from the summation hasn't been removed yet. This happens in the sanitize
function.
Can you give more context?
I meant this:
Let's say for simplicity that
get_edabit(size)
should return asint
followed bysize
number ofsbit
s, as usual, and then one moresbit
which is the product of the first twosbit
s. Where in the preprocessing code should that product be computed?I think easiest and cleanest are not necessarily the same thing here. It's certainly easier to work on the Python level, and maybe cleaner to defer the sbit multiplication to the virtual machine. However, it might be that the resulting saving of moving to the virtual machine is not worth your effort as it's clearly more involved. That said, I cannot make this decision for you.
How would you have done the above in the Python level?
The loop runs longer because the edaBits are still "loose" at this time, that is the overflow from the summation hasn't been removed yet. This happens in the
sanitize
function.
Ah I should have placed my modifications a few lines later, precisely as you were already indicating earlier. But I now suspect that somewhere at the start of the edaBit generation process (or in the function calling it) I will have to replace n_bits
by n_bits - 1
(because I had to increment it here) - do you happen to know which line(s) this is? Edit: I think this line might suffice, it seems to work.
For the next step it may be necessary to pad the edaBit with a few sbit(0)
s - how does one construct those in this low-level interface? I've tried GC::TinierShare<gf2n_short>::constant(0, bit_proc.P.my_num(), bit_MC.get_alphai());
but it yields conversion errors, whereas GC::TinierShare<gf2n_mac_key>::constant(0, bit_proc.P.my_num(), bit_MC.get_alphai());
yields SECURITY BUG: insufficient checking
. It seems to work when I simply remove that check, but I presume this is undesirable? Is gf2n_mac_key
indeed the right field for these bits?
Do you have any idea why using ElementType = std::decay_t<decltype(x.second)::value_type>
somewhere around here yields template argument for template type parameter must be a type; did you forget 'typename'?
?
In some of my older code cbit(1) & edabit[1][0]
works, but in other code (including the current commit) it yields Compiler.exceptions.CompilerError: cannot convert cb0(1)/cb0(1) from <class 'Compiler.GC.types.cbit'> to <class 'Compiler.GC.types.bits.get_type.<locals>.bitsn'>
; any idea what is causing this?
Let's say for simplicity that
get_edabit(size)
should return asint
followed bysize
number ofsbit
s, as usual, and then one moresbit
which is the product of the first twosbit
s. Where in the preprocessing code should that product be computed?How would you have done the above in the Python level?
The product of two sbits
can be computed using a & b
.
For the next step it may be necessary to pad the edaBit with a few
sbit(0)
s - how does one construct those in this low-level interface? I've triedGC::TinierShare<gf2n_short>::constant(0, bit_proc.P.my_num(), bit_MC.get_alphai());
but it yields conversion errors, whereasGC::TinierShare<gf2n_mac_key>::constant(0, bit_proc.P.my_num(), bit_MC.get_alphai());
yieldsSECURITY BUG: insufficient checking
. It seems to work when I simply remove that check, but I presume this is undesirable? Isgf2n_mac_key
indeed the right field for these bits?
You can use the default constructor of TinierShare
to get a zero sharing.
Do you have any idea why
using ElementType = std::decay_t<decltype(x.second)::value_type>
somewhere around here yieldstemplate argument for template type parameter must be a type; did you forget 'typename'?
?
I'm not familiar with this construction, maybe you just need to add typename
before decltpe
.
In some of my older code
cbit(1) & edabit[1][0]
works, but in other code (including the current commit) it yieldsCompiler.exceptions.CompilerError: cannot convert cb0(1)/cb0(1) from <class 'Compiler.GC.types.cbit'> to <class 'Compiler.GC.types.bits.get_type.<locals>.bitsn'>
; any idea what is causing this?
Can you give a minimal example of this? The following does not produce any errors for me: cbit(1) & sint.get_edabit(10)[1][0]
Let's say for simplicity that
get_edabit(size)
should return asint
followed bysize
number ofsbit
s, as usual, and then one moresbit
which is the product of the first twosbit
s. Where in the preprocessing code should that product be computed?How would you have done the above in the Python level?
The product of two
sbits
can be computed usinga & b
.
Yes, my question is how to do that in preprocessing - I think I can do it at the C++ level by modifying things e.g. here, but I'm curious how to do it at the "Python level" as you were suggesting earlier:
It's certainly easier to work on the Python level, and maybe cleaner to defer the sbit multiplication to the virtual machine. However, it might be that the resulting saving of moving to the virtual machine is not worth your effort as it's clearly more involved. That said, I cannot make this decision for you.
For the next step it may be necessary to pad the edaBit with a few
sbit(0)
s - how does one construct those in this low-level interface? I've triedGC::TinierShare<gf2n_short>::constant(0, bit_proc.P.my_num(), bit_MC.get_alphai());
but it yields conversion errors, whereasGC::TinierShare<gf2n_mac_key>::constant(0, bit_proc.P.my_num(), bit_MC.get_alphai());
yieldsSECURITY BUG: insufficient checking
. It seems to work when I simply remove that check, but I presume this is undesirable? Isgf2n_mac_key
indeed the right field for these bits?You can use the default constructor of
TinierShare
to get a zero sharing.
Oh you mean TinierShare()
? I can't get that to work atm, where in the code is it implemented?
In some of my older code
cbit(1) & edabit[1][0]
works, but in other code (including the current commit) it yieldsCompiler.exceptions.CompilerError: cannot convert cb0(1)/cb0(1) from <class 'Compiler.GC.types.cbit'> to <class 'Compiler.GC.types.bits.get_type.<locals>.bitsn'>
; any idea what is causing this?Can you give a minimal example of this? The following does not produce any errors for me:
cbit(1) & sint.get_edabit(10)[1][0]
Let's say you put something like return sint(cbit(1) & sint.get_edabit(10)[1][0])
here or here and then compute a comparison between two sint
s inside some .mpc
file?
Let's say for simplicity that
get_edabit(size)
should return asint
followed bysize
number ofsbit
s, as usual, and then one moresbit
which is the product of the first twosbit
s. Where in the preprocessing code should that product be computed?How would you have done the above in the Python level?
The product of two
sbits
can be computed usinga & b
.Yes, my question is how to do that in preprocessing - I think I can do it at the C++ level by modifying things e.g. here, but I'm curious how to do it at the "Python level" as you were suggesting earlier:
It's certainly easier to work on the Python level, and maybe cleaner to defer the sbit multiplication to the virtual machine. However, it might be that the resulting saving of moving to the virtual machine is not worth your effort as it's clearly more involved. That said, I cannot make this decision for you.
Maybe there's a misunderstanding here. The Python level is oblivious to preprocessing, but you can use timers to differentiate the benchmark between the two phases.
For the next step it may be necessary to pad the edaBit with a few
sbit(0)
s - how does one construct those in this low-level interface? I've triedGC::TinierShare<gf2n_short>::constant(0, bit_proc.P.my_num(), bit_MC.get_alphai());
but it yields conversion errors, whereasGC::TinierShare<gf2n_mac_key>::constant(0, bit_proc.P.my_num(), bit_MC.get_alphai());
yieldsSECURITY BUG: insufficient checking
. It seems to work when I simply remove that check, but I presume this is undesirable? Isgf2n_mac_key
indeed the right field for these bits?You can use the default constructor of
TinierShare
to get a zero sharing.Oh you mean
TinierShare()
? I can't get that to work atm, where in the code is it implemented?
This is defined in GC/TinierShare.h
. What do you mean by not getting that to work?
In some of my older code
cbit(1) & edabit[1][0]
works, but in other code (including the current commit) it yieldsCompiler.exceptions.CompilerError: cannot convert cb0(1)/cb0(1) from <class 'Compiler.GC.types.cbit'> to <class 'Compiler.GC.types.bits.get_type.<locals>.bitsn'>
; any idea what is causing this?Can you give a minimal example of this? The following does not produce any errors for me:
cbit(1) & sint.get_edabit(10)[1][0]
Let's say you put something like
return sint(cbit(1) & sint.get_edabit(10)[1][0])
here or here and then compute a comparison between twosint
s inside some.mpc
file?
I now can see the issue. It happens because the CISC optimization requires code to be flexible regarding the vector length, which is not the case for the AND of cbit
(fixed length 1) and sbits
(flexible length). You can fix this by switching off the CISC optimization (using ./compile.py -O
) or using the ~
operator to invert sbits instead of AND with 1.
Let's say for simplicity that
get_edabit(size)
should return asint
followed bysize
number ofsbit
s, as usual, and then one moresbit
which is the product of the first twosbit
s. Where in the preprocessing code should that product be computed?How would you have done the above in the Python level?
The product of two
sbits
can be computed usinga & b
.Yes, my question is how to do that in preprocessing - I think I can do it at the C++ level by modifying things e.g. here, but I'm curious how to do it at the "Python level" as you were suggesting earlier:
It's certainly easier to work on the Python level, and maybe cleaner to defer the sbit multiplication to the virtual machine. However, it might be that the resulting saving of moving to the virtual machine is not worth your effort as it's clearly more involved. That said, I cannot make this decision for you.
Maybe there's a misunderstanding here. The Python level is oblivious to preprocessing, but you can use timers to differentiate the benchmark between the two phases.
The aim would be to e.g. run ResNet50 with a custom comparison protocol, whilst separating the customised preprocessing and online phases. If using tf.mpc
to execute ResNet50, how should one go about overloading the default comparison protocol and also placing such timers?
For the next step it may be necessary to pad the edaBit with a few
sbit(0)
s - how does one construct those in this low-level interface? I've triedGC::TinierShare<gf2n_short>::constant(0, bit_proc.P.my_num(), bit_MC.get_alphai());
but it yields conversion errors, whereasGC::TinierShare<gf2n_mac_key>::constant(0, bit_proc.P.my_num(), bit_MC.get_alphai());
yieldsSECURITY BUG: insufficient checking
. It seems to work when I simply remove that check, but I presume this is undesirable? Isgf2n_mac_key
indeed the right field for these bits?You can use the default constructor of
TinierShare
to get a zero sharing.Oh you mean
TinierShare()
? I can't get that to work atm, where in the code is it implemented?This is defined in
GC/TinierShare.h
. What do you mean by not getting that to work?
GC::TinierShare()
yields no viable constructor or deduction guide for deduction of template arguments of 'TinierShare'
The aim would be to e.g. run ResNet50 with a custom comparison protocol, whilst separating the customised preprocessing and online phases. If using
tf.mpc
to execute ResNet50, how should one go about overloading the default comparison protocol and also placing such timers?
You could time the protocol using constant preprocessing and then benchmark the preprocessing separately.
GC::TinierShare()
yieldsno viable constructor or deduction guide for deduction of template arguments of 'TinierShare'
You need to define the data type used for the MAC sharing: GC::TinierShare<gf2n_mac_key>()
. Alternatively, you can use the implicit definition T::bit_type::part_type
.
The aim would be to e.g. run ResNet50 with a custom comparison protocol, whilst separating the customised preprocessing and online phases. If using
tf.mpc
to execute ResNet50, how should one go about overloading the default comparison protocol and also placing such timers?You could time the protocol using constant preprocessing and then benchmark the preprocessing separately.
As far as I can tell, this approach to the problem I sketched would only make sense if you can retrieve customised (and stored) preprocessing material through the Python interface? Can you please provide a sketch of what that would look like?
GC::TinierShare()
yieldsno viable constructor or deduction guide for deduction of template arguments of 'TinierShare'
You need to define the data type used for the MAC sharing:
GC::TinierShare<gf2n_mac_key>()
. Alternatively, you can use the implicit definitionT::bit_type::part_type
.
Oh, the data type specified there for these secret-shared bits is only for the MAC sharing component? In what context would that be useful?
Anyway, I am still gettingSECURITY BUG: insufficient checking
. How would you suggest one fixes this?
The aim would be to e.g. run ResNet50 with a custom comparison protocol, whilst separating the customised preprocessing and online phases. If using
tf.mpc
to execute ResNet50, how should one go about overloading the default comparison protocol and also placing such timers?You could time the protocol using constant preprocessing and then benchmark the preprocessing separately.
As far as I can tell, this approach to the problem I sketched would only make sense if you can retrieve customised (and stored) preprocessing material through the Python interface? Can you please provide a sketch of what that would look like?
I disagree. While not perfect, using constant preprocessing can give a good approximation of the cost (as long as it's not all zero or the like, which might lead to unrealistic speed-ups.
GC::TinierShare()
yieldsno viable constructor or deduction guide for deduction of template arguments of 'TinierShare'
You need to define the data type used for the MAC sharing:
GC::TinierShare<gf2n_mac_key>()
. Alternatively, you can use the implicit definitionT::bit_type::part_type
.Oh, the data type specified there for these secret-shared bits is only for the MAC sharing component? In what context would that be useful?
The template parameter specifies the type of MAC. This is useful in case you want to change the number of bits in the MAC.
Anyway, I am still getting
SECURITY BUG: insufficient checking
. How would you suggest one fixes this?
You can avoid this by running Check
method of any MAC_Check/MC/opening instance you're using at the end.
The aim would be to e.g. run ResNet50 with a custom comparison protocol, whilst separating the customised preprocessing and online phases. If using
tf.mpc
to execute ResNet50, how should one go about overloading the default comparison protocol and also placing such timers?You could time the protocol using constant preprocessing and then benchmark the preprocessing separately.
As far as I can tell, this approach to the problem I sketched would only make sense if you can retrieve customised (and stored) preprocessing material through the Python interface? Can you please provide a sketch of what that would look like?
I disagree. While not perfect, using constant preprocessing can give a good approximation of the cost (as long as it's not all zero or the like, which might lead to unrealistic speed-ups.
I don't follow, I was asking for some kind of sketch or example if you believe this approach is feasible - how does one retrieve customised preprocessing material through the Python interface?
GC::TinierShare()
yieldsno viable constructor or deduction guide for deduction of template arguments of 'TinierShare'
You need to define the data type used for the MAC sharing:
GC::TinierShare<gf2n_mac_key>()
. Alternatively, you can use the implicit definitionT::bit_type::part_type
.Oh, the data type specified there for these secret-shared bits is only for the MAC sharing component? In what context would that be useful?
The template parameter specifies the type of MAC. This is useful in case you want to change the number of bits in the MAC.
Right, and when would you want that number to be different from the number of bits in the non-MAC part of a TinierShare
?
The aim would be to e.g. run ResNet50 with a custom comparison protocol, whilst separating the customised preprocessing and online phases. If using
tf.mpc
to execute ResNet50, how should one go about overloading the default comparison protocol and also placing such timers?You could time the protocol using constant preprocessing and then benchmark the preprocessing separately.
As far as I can tell, this approach to the problem I sketched would only make sense if you can retrieve customised (and stored) preprocessing material through the Python interface? Can you please provide a sketch of what that would look like?
I disagree. While not perfect, using constant preprocessing can give a good approximation of the cost (as long as it's not all zero or the like, which might lead to unrealistic speed-ups.
I don't follow, I was asking for some kind of sketch or example if you believe this approach is feasible - how does one retrieve customised preprocessing material through the Python interface?
I meant to avoid the need to for retrieving customised preprocessing material through the Python interface. Instead, I would suggest to benchmark two things:
GC::TinierShare()
yieldsno viable constructor or deduction guide for deduction of template arguments of 'TinierShare'
You need to define the data type used for the MAC sharing:
GC::TinierShare<gf2n_mac_key>()
. Alternatively, you can use the implicit definitionT::bit_type::part_type
.Oh, the data type specified there for these secret-shared bits is only for the MAC sharing component? In what context would that be useful?
The template parameter specifies the type of MAC. This is useful in case you want to change the number of bits in the MAC.
Right, and when would you want that number to be different from the number of bits in the non-MAC part of a
TinierShare
?
TinierShare
always contains just one bit in the non-MAC part but the has to be much longer to achieve reasonable security. 40 bits is the minimum accepted by the community in my view.
Hello!
In order to properly benchmark the preprocessing and online phase of a certain comparison protocol for linear secret sharing scheme-based MPC protocols (see old draft here), I believe we need to customise some preprocessing code but I have not been able to find in the current documentation how to do that.
For example, it would be very useful to know how to produce in preprocessing triples ([a],[b],[ab]) where [a] and [b] are both bits in the arithmetic domain.
Judging by this line, it seems that I would need to create a new function in
Fake-Offline.cpp
to generate such triples. But I suspect I also need to make the compiler understand this new "data type"/"instruction"; any suggestion where to make the required edits?Thank you for your help!