Closed waamm closed 1 month ago
I can see two easy ways of achieving this:
If you really want to add an instruction, you have add it to at least the following places:
instructions.py
Instruction::execute()
Instruction::parse_operands()
Dtype
and use the same infrastructure as regular triples.Many thanks, this was very helpful! It seems to me that (2) would not work since ordinary triples are also needed for this protocol, but (1) sounds perfect. That option did not occur to me since I was thinking the compiler would make an effort (for such MPC protocols) to reduce circuit depth as much as possible. So I guess you're saying that this should work:
def preprocessing():
...
start_timer(1)
def online():
...
stop_timer(1)
The only way to check whether or not two "independent" multiplication gates are indeed executed in the same communication round, is by inspecting the compiled program?
Many thanks, this was very helpful! It seems to me that (2) would not work since ordinary triples are also needed for this protocol, but (1) sounds perfect. That option did not occur to me since I was thinking the compiler would make an effort (for such MPC protocols) to reduce circuit depth as much as possible. So I guess you're saying that this should work:
Yes because timer operations imply a break, that is, no circuit optimization is done between before and after.
The only way to check whether or not two "independent" multiplication gates are indeed executed in the same communication round, is by inspecting the compiled program?
For a set of specific gates yes. For a more birds-eye view, you can look at the number of virtual machine rounds output by the compiler, and see if it roughly matches your expectations. A virtual machine round is one round of any operation optimized by the compiler and that includes any sort of multiplication.
A .reveal()
in shamir
takes 2 rounds but in mal-shamir
it takes 1 round, is that correct?
x = sint(5)
start_timer(1)
x.reveal()
stop_timer(1)
In the default configuration yes. shamir
uses a star-based opening protocol that scales better with the number of parties at the expense of an extra round. There is the option to use direct communication more similar to mal-shamir
by using the --direct
command-line argument.
Oh, that sounds like the "king node" approach of Damgård-Nielsen? I'm curious why it would be limited to the opening part of the protocol, and to dishonest-majority protocols - is there a reference for this particular approach?
It's certainly related but the approach is generic. With any secret sharing scheme you can do reconstruction by sending all shares to one party and then sending the result back to all parties (O(n)) instead of every party sending their share to every other party (O(n^2)). However, there might be issues with a malicious king node, which in some protocols can be solved in other ways, see for example: https://eprint.iacr.org/2012/642 Regarding generalisation, some multiplication protocols are based are opening (like Damgård-Nielsen), so the properties of the opening protocol filter through to the multiplication protocol, but this isn't the case for all protocols. Regarding the restriction to dishonest-majority protocols, it's just that it wasn't always implemented for Shamir secret sharing, the documentation is just outdate. Thank you for bringing this up.
Inside a large circuit, I now have an array arr
of secret-shared sint
values which are used in multiplications and additions later in the circuit. Depending on some random values of an array arr2
that are revealed (becoming cint
), some of the secret-shared values in arr
must be zero, and hence the corresponding multiplications can be skipped. I was hoping that inserting this code might work, but I realise that might be naive?
for i in range(len(arr)):
@if_(arr2[i] == val)
def _():
arr[i] = cint(0)
Bandwidth indeed drops, but by an amount which does not appear to be random (yet it should have been) and the round complexity is going up (with both shamir
and mascot
).
I'll try to isolate the problem better, but I thought perhaps you already have some idea of what is going on? This at least reproduces the bandwidth phenomenon:
x = [sint.get_random_bit().reveal() for i in range(10)]
y = [sint(i) for i in range(10)]
z = [sint(i) for i in range(10)]
start_timer(1)
for i in range(10):
@if_(x[i] == 1)
def _():
y[i] = cint(0)
for i in range(10):
(y[i] * z[i]).reveal()
stop_timer(1)
You cannot mix run-time branching and Python lists. The example will set all values in y
to 0 independently of the condition. As a general rule, only use Array
with run-time branching.
Made some changes, but I still don't understand why for the following code the round complexity with shamir
is larger than 7? (The line y[i] = cint(0)
is probably incorrect, but the result is the same with y[i] = sint(0)
.)
size = 40
x = Array(size, sint)
y = Array(size, sint)
x2 = Array(size, cint)
for i in range(size):
x[i] = sint.get_random_bit()
x2[i] = x[i].reveal()
y[i] = sint(2)
zeroes = Array(size, cint)
for i in range(size):
@if_(x2[i] == 0)
def _():
zeroes[i] = cint(1)
start_timer(1)
while size > 1:
size = size // 2
@for_range_parallel(size, size)
def _(i):
@if_e(zeroes[2*i] == 1)
def _():
y[i] = cint(0)
@else_
def _():
y[i] = y[i] + y[i] * y[2*i]
z = y[20].reveal()
print_ln("%s", z)
stop_timer(1)
The conditional @if_e(zeroes[2*i] == 1)
prevents the parallelization, so the multiplication y[i] * y[2*i]
is executed size
times consecutively.
Is there a straightforward way around that?
A straightforward way at some bandwidth expense is to use if_else
instead of the @if_e
: https://mp-spdz.readthedocs.io/en/latest/Compiler.html#Compiler.types.sint.if_else
A more involved way saving some bandwidth is to determine a reasonable upper bound of multiplications for every round and setup an array to be multiplied using @if_e
, then running the multiplications in parallel, and then post-process with more conditionals. The crux is that the multiplications cannot be inside the conditional.
Many thanks for those suggestions, but I'm not entirely sure I follow so here's a simplified version of the problem:
y = [sint.get_random() for i in range(size)]
x_clear = [sint.get_random_bit().reveal() for i in range(size)]
start_timer(1)
for i in range(size):
y[i] = y[i] * y[i] * x_clear[i]
stop_timer(1)
Thus there's a 50% change that y[i]
is 0
(or rather, [0]
) and the multiplication y[i] * y[i]
does not have to be executed.
But I would say there is no "reasonable upper bound" available, other than size
itself; is it possible to obtain a 50% reduction in bandwidth here, using the methods you just described?
I think you can only get the bandwidth reduction at the cost of more rounds as in the earlier examples.
That's unfortunate, but many thanks again for your effort.
Following this code, I'd now like to separately benchmark the online and offline phase, running a protocol f(x,y,prep_material)
thousands or millions of times in multiple threads. Here prep_material
is an array of (arrays of) preprocessed secret values (random values, random bits, edaBits, and values obtained by multiplying or adding some of these to each other, etc). Something like this:
n = 1024
n_threads = 8
l = 1
prep_materials = []
for i in range(n):
prep_materials.append(preprocessing())
res = sint.Array(n)
start_timer(1)
@multithread(n_threads, n)
def _(base, m):
@for_range(l)
def _(i):
f(sint(1, size=m), sint(2, size=m), prep_materials[base:base+m]).store_in_mem(base)
stop_timer(1)
One immediate problem here is: TypeError: slice indices must be integers or None or have an __index__ method
What would I need to change to make such code work?
You need to use an array for prep_materials
and the get_vector()
instead of Python slicing.
By which you mean a MultiArray
or Tensor
? For an f requiring 2 edabits, I just tried something like this:
edabit_values = sint.Tensor([n,2,1])
edabit_bits = sint.Tensor([n,2,bit_length])
def preprocessing():
edabit0, edabit1 = [sint.get_edabit(edabits_size, True) for i in range(2)]
return [edabit0[0], edabit1[0]], [edabit0[1], edabit1[1]]
for i in range(n):
edabit_values[i], edabit_bits[i] = preprocessing()
start_timer(1)
@multithread(n_threads, n)
def _(base, m):
print("m = ", m)
@for_range(l)
def _(i):
f(sint(1, size=m), sint(2, size=m), edabit_values.get_vector(base, m), edabit_bits.get_vector(base, m)).store_in_mem(base)
stop_timer(1)
@vectorize
def f(x, y, edabit_values, edabit_bits):
a = edabit_values[1]
But here the final line a = edabit_values[1]
produces an IndexError: list index out of range
I think the easiest would be to only use Array
.
Not sure I follow - each instance of this f requires 2 edabits, so that's $2 \cdot bitlength$ sbits and 2 sints; are you saying I should produce $2 \cdot bitlength + 2$ separate Array
s?
That's what I meant but you can actually use get_part()
as well: https://mp-spdz.readthedocs.io/en/latest/Compiler.html#Compiler.types.MultiArray.get_part
Ah I think you mean get_part_vector()
? That part now seems to work (though I just realised that I wrote edabit_values = sint.Tensor([n,2,1])
above but that should've probably been edabit_values = sint.Tensor([n,2])
instead, and I'm not sure the compiler noticed?). However, a bit further down some code analogous to
@vectorize
def f(x, y, edabit_values, edabit_bits):
a = edabit_values[1]
for i in range(bit_length):
b = edabit_bits[0][i]
fails to compile, due to another IndexError
in the final line (referring to the [i]
portion).
No, I mean get_part()
because it returns a MultiArray of the same dimension just partial along the first dimension.
That yields raise CompilerError('index out of range')
in the a = edabit_values[1]
line again.
Please post the full code.
Here's a shortened version (which I hope is easier to work with, otherwise I'll post a longer version):
bit_length = 64
edabits_size = bit_length
def preprocessing():
edabit0, edabit1 = [sint.get_edabit(edabits_size, True) for i in range(2)]
return [edabit0[0], edabit1[0]], [edabit0[1], edabit1[1]]
@vectorize
def f(x, y, edabit_values, edabit_bits):
a = edabit_values[1] - x
b = a.reveal()
return (a, f2(b, edabit_bits[0]))
@vectorize
def f2(b, eda):
b_bits = cint.bit_decompose(b)
return [eda[i].bit_xor(b_bits[i]) for i in range(bit_length)]
n = 1024
n_threads = 8
l = 1
res = sint.Array(n)
edabit_values = sint.Tensor([n,2])
edabit_bits = sint.Tensor([n,2,bit_length])
for i in range(n):
edabit_values[i], edabit_bits[i] = preprocessing()
start_timer(2)
@multithread(n_threads, n)
def _(base, m):
print("m = ", m)
@for_range(l)
def _(i):
f(sint(1, size=m), sint(2, size=m), edabit_values.get_part(base, m), edabit_bits.get_part(base, m)).store_in_mem(base)
stop_timer(2)
Because m=1, edabit_values.get_part(base, m)
has dimension (1,2), so 1 is out of bounds.
Hmm but the point is to increase m
and "vectorise" this? How should I write that?
I don't understand the relevant documentation: it says "Distribute the computation of n_items
to n_threads
threads", then sets n_threads = 8
but then it says "in three different threads"?
That is indeed a typo but my previous comment referred to the code example you posted originally. The changed code example doesn't the produce out-of-index error.
Yes, instead now there's a Compiler.exceptions.VectorMismatch: Different vector sizes of operands: 2/128
So in that code $m = 128 = 1024 / 8$? Does that mean that edabit_values.get_part(base, m)
has dimension (m,2)? Doesn't that conflict with the attempt to "vectorise" this code?
So in that code m=128=1024/8? Does that mean that
edabit_values.get_part(base, m)
has dimension (m,2)?
Yes.
Doesn't that conflict with the attempt to "vectorise" this code?
What do you mean?
Sorry, I phrased that badly. What I meant to say is that it is my impression that when the vectorised f
receives the two vectors sint(1, size=m)
and sint(2, size=m)
of size m
, it seems to me that it acts on them entrywise, as if only two individual elements were passed along?
Then the edabit_values input should have dimension (m, 2)
and the edabit_bits input have dimension (m, 2, bit_length)
, but inside f
they should appear as arrays of dimension 2
and (2, bit_length)
?
I see what you mean. @vectorize
is a relatively simple approach to make sure that code that works with single sint
etc. also works with vectors thereof. It does not handle arrays or tensors in any way.
I can see two easy ways of achieving this:
- Compute the triples in the online phase and separate out the benchmarking using timers.
Is there a straightforward way to make that approach work with say the EzPC ResNet code? I was thinking of overloading/replacing the comparison calls inside non_linear.py
and then executing such code, but I believe the issue of separating the offline and online phases remain.
You could just start and stop a timer within the comparison call replacement, so I still don't see an issue.
I probably wasn't clear - the idea would be to record the change in the total online time (and preprocessing bandwidth) required to execute something like ResNet50, by changing the comparison protocols in say this line.
I still don't understand how to make that work - where should I put the timer(s)? It seems to me that what you're describing would yield many tiny measurements instead? (Also I would need to figure out how to store and reference the preprocessing material?)
Also, do you have any other benchmark suggestions? One of the primary aims behind measuring the required online time of such a heavy workload is to determine whether any change in computational complexity has a significant impact.
I don't think I misunderstood. You could (and probably should) do the preprocessing in batches just like it's done in the virtual machines, roughly as follows:
if not preprocessing left:
start timer
do preprocessing of a batch of triples
stop timer
use preprocessing
That way you can reduce the costs associated with preprocessing even the timer calls. I don't see an issue in storing the preprocessing, just use the usual containers.
All that said, I don't want to stop you from implementing it in C++ as the rest, I just think it might be easier in Python if you can implement the preprocessing in Python.
I am very glad to hear that this seems doable, but my background is not that technical and consequently I'm not yet familiar with the virtual machines at this level - are you aware of a similar MP-SPDZ coding example somewhere?
So instead of LtzRing(a, k)
I want to use NewLtz(a, k, preprocessing_for_one_invocation)
, probably by switching them inside non_linear.py
. Now given some program like ResNet50 which has comparisons in it, how am I supposed to feed NewLtz
its required preprocessing material?
(Yes the Python code is ready, e.g. this function involving some multiplications is involved. Similarly, the Rabbit protocol can be significantly improved in this regard by moving the bit addition protocol of edaBits to preprocessing.)
I am very glad to hear that this seems doable, but my background is not that technical and consequently I'm not yet familiar with the virtual machines at this level - are you aware of a similar MP-SPDZ coding example somewhere?
An example of what?
So instead of
LtzRing(a, k)
I want to useNewLtz(a, k, preprocessing_for_one_invocation)
, probably by switching them insidenon_linear.py
. Now given some program like ResNet50 which has comparisons in it, how am I supposed to feedNewLtz
its required preprocessing material?
What do you mean by feed? The pseudo-code above involves keeping a batch of preprocessing in store as a well as a counter and replenishing it whenever it's empty. The same principle is used in the C++ code.
I meant that the pseudocode you wrote probably should not go into non_linear.py
or the .mpc
file itself; where in the codebase is this C++ preprocessing code, and in particular the preprocessing counter? I presume that is where I should put (an appropriate version of) the Python preprocessing code?
No, that's not what I meant. It's just a design principle that can be anywhere. One application in C++ is for triples here where the counter is simply the size of the C++ vector: https://github.com/data61/MP-SPDZ/blob/a44132e5095f84ed5fda3e27c100bf2d6e462243/Protocols/ReplicatedPrep.hpp#L221C1-L226C6
I'm thinking now that instead of adding new preprocessing material from scratch, it's probably easier for me and will suffice (for now) to extend the existing edaBit generation protocol, as follows: instead of returning an edaBit, i.e. a value in Z/mZ together with sharings of its bits in Z/2Z, it would additionally return certain products of those bits.
I'll try that tomorrow - following the manual, I could time the online phase of ResNet50 using "insecure preprocessing"? For this, would modifying plain_edabits
suffice? For the impact on preprocessing bandwidth, I would create one program which retrieves a bunch of edaBits, and another which subsequently performs those bit multiplications.
I don't see a reason why it wouldn't work.
Adding program.use_edabit(True)
to tf.mpc
doesn't seem to have an effect for SqueezeNet; is that because edaBits are already used by default or because this SqueezeNet does not have operations like ReLUs?
Also, for testing I added some print_ln
statements to non_linear.py
. The command ./compile.py -R 64 tf EzPC/Athos/Networks/SqueezeNetImgNet/graphDef.bin 1 trunc_pr split
then yields Compile with '-O'
, but that doesn't seem to work regardless of where I place -O
?
Adding
program.use_edabit(True)
totf.mpc
doesn't seem to have an effect for SqueezeNet; is that because edaBits are already used by default or because this SqueezeNet does not have operations like ReLUs?
If you compile split
, it uses local share conversion instead of edaBits as described in https://eprint.iacr.org/2018/403
Also, for testing I added some
print_ln
statements tonon_linear.py
. The command./compile.py -R 64 tf EzPC/Athos/Networks/SqueezeNetImgNet/graphDef.bin 1 trunc_pr split
then yieldsCompile with '-O'
, but that doesn't seem to work regardless of where I place-O
?
Thank you for raising this. You should find that ef82a68aa9 fixes it.
Thanks!
When I put something like this in a .mpc
file, it seems to work:
T = cint(1)
b = cint(2)
c = cbit(T < b)
But when I put it inside non_linear.py
(and import cbit
), I get errors like these:
File ".../Compiler/non_linear.py", line 228, in LTS
c = cbit(T < b)
^^^^^^^^^^^
File ".../Compiler/GC/types.py", line 143, in __init__
self.load_other(value)
File ".../Compiler/GC/types.py", line 162, in load_other
n_convs = min(other.size, n_units)
^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: '<' not supported between instances of 'int' and 'NoneType'
Also, print(T < b)
yields ciinf
, is that intended?
This is probably due to the same optimization causing issues earlier. It tries to generate code without concrete vector lengths, hence the appearance of None
and inf
. You can try with -O
or remove the @instructions_base.cisc
decorator from LTZ
in comparison.py
.
Previous code worked after your fix, but now similarly this code
a = sint.get_edabit(64, True)[1][0]
b = (0 < 1)
c = a + b
compiles when put in a .mpc
file, but not when similarly placed inside non_linear.py
.
File ".../MP-SPDZ/Compiler/non_linear.py", line 199, in ltz
return LtzRing(c, k)
^^^^^^^^^^^^^
File ".../MP-SPDZ/Compiler/comparison.py", line 100, in LtzRing
tmp = a - r_prime
~~^~~~~~~~~
File ".../MP-SPDZ/Compiler/types.py", line 220, in read_mem_operation
return operation(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../MP-SPDZ/Compiler/GC/types.py", line 521, in __add__
other = self.conv(other)
^^^^^^^^^^^^^^^^
File ".../MP-SPDZ/Compiler/GC/types.py", line 54, in conv
res.load_other(other)
File ".../MP-SPDZ/Compiler/GC/types.py", line 514, in load_other
super(sbits, self).load_other(other)
File ".../MP-SPDZ/Compiler/GC/types.py", line 178, in load_other
self.mov(self, sbitvec(other, self.n).elements()[0])
^^^^^^^^^^^^^^^^^^^^^^
File ".../MP-SPDZ/Compiler/GC/types.py", line 882, in __init__
c = ((elements - r) << (l - length)).reveal()
~~~~~~~~~~~~~~~^^~~~~~~~~~~~~~
File ".../MP-SPDZ/Compiler/types.py", line 2820, in __lshift__
return self * util.pow2_value(other, bit_length, security)
~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
TypeError: unsupported operand type(s) for *: 'sint' and 'float'
First, the code above is somewhat trivial because the comparison is between two Python integers (0 < 1
). Second, it might be that (l - length)
is negative because the -R
parameter is too low, so I would recommend giving a higher number there, but it's hard to be sure without seeing the actual code.
I'm getting the same error for (cint(0) < cint(1))
.
The actual code to be inserted is here (which does compile inside of a .mpc
file); the above snippet is a simplified version of a part of LTS
. But the error is now very different indeed:
Writing to Programs/Bytecode/tf-EzPC_Athos_Networks_SqueezeNetImgNet_graphDef.bin-1-trunc_pr-multithread-2.bc
Traceback (most recent call last):
File ".../MP-SPDZ/Compiler/instructions_base.py", line 975, in check_args
ArgFormats[f].check(arg)
File ".../MP-SPDZ/Compiler/instructions_base.py", line 764, in check
raise ArgumentError(arg, "Wrong register type '%s', expected '%s'" % \
Compiler.exceptions.ArgumentError: (sb47398529(12769)(817216), "Wrong register type 'sb', expected 's'")
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File ".../MP-SPDZ/./compile.py", line 41, in <module>
main(compiler)
File ".../MP-SPDZ/./compile.py", line 36, in main
compilation(compiler)
File ".../MP-SPDZ/./compile.py", line 19, in compilation
prog = compiler.compile_file()
^^^^^^^^^^^^^^^^^^^^^^^
File ".../MP-SPDZ/Compiler/compilerLib.py", line 454, in compile_file
exec(compile(infile.read(), infile.name, "exec"), self.VARS)
File "Programs/Source/tf.mpc", line 36, in <module>
opt.forward(1, keep_intermediate=False)
File ".../MP-SPDZ/Compiler/../Compiler/ml.py", line 200, in wrapper
res = function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../MP-SPDZ/Compiler/../Compiler/ml.py", line 2278, in forward
layer.forward(batch=self.batch_for(layer, batch),
File ".../MP-SPDZ/Compiler/../Compiler/ml.py", line 265, in forward
self._forward(batch)
File ".../MP-SPDZ/Compiler/../Compiler/ml.py", line 1048, in _forward
@multithread(self.n_threads, len(batch) * n_per_item)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../MP-SPDZ/Compiler/library.py", line 1084, in decorator
tape = prog.new_tape(f, (0,), 'multithread')
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../MP-SPDZ/Compiler/program.py", line 315, in new_tape
function(*args)
File ".../MP-SPDZ/Compiler/library.py", line 1066, in f
return loop_body(base, thread_rounds + inc)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../MP-SPDZ/Compiler/../Compiler/ml.py", line 1050, in _
self.Y.assign_vector(self.f_part(base, size), base)
^^^^^^^^^^^^^^^^^^^^^^^
File ".../MP-SPDZ/Compiler/../Compiler/ml.py", line 1099, in f_part
c = x > 0
^^^^^
File ".../MP-SPDZ/Compiler/types.py", line 141, in vectorized_operation
res = operation(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../MP-SPDZ/Compiler/types.py", line 4475, in __gt__
return self.v.greater_than(other.v, self.k, self.kappa)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../MP-SPDZ/Compiler/types.py", line 220, in read_mem_operation
return operation(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../MP-SPDZ/Compiler/types.py", line 228, in type_check
return operation(self, other, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../MP-SPDZ/Compiler/types.py", line 141, in vectorized_operation
res = operation(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../MP-SPDZ/Compiler/types.py", line 2725, in __gt__
comparison.LTZ(res, other - self,
File ".../MP-SPDZ/Compiler/comparison.py", line 84, in LTZ
movs(s, program.non_linear.ltz(a, k, kappa))
File ".../MP-SPDZ/Compiler/instructions_base.py", line 408, in maybe_gf2n_instruction
return instruction(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../MP-SPDZ/Compiler/instructions_base.py", line 317, in maybe_vectorized_instruction
return Vectorized_Instruction(size, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../MP-SPDZ/Compiler/instructions_base.py", line 281, in __init__
super(Vectorized_Instruction, self).__init__(*args, **kwargs)
File ".../MP-SPDZ/Compiler/instructions_base.py", line 930, in __init__
self.check_args()
File ".../MP-SPDZ/Compiler/instructions_base.py", line 977, in check_args
raise CompilerError('Invalid argument %d "%s" to instruction: %s'
Compiler.exceptions.CompilerError: Invalid argument 1 "sb47398529(12769)(817216)" to instruction: vmovs 817216, s1634432(817216), sb47398529(12769)(817216)
Wrong register type 'sb', expected 's'
Hello!
In order to properly benchmark the preprocessing and online phase of a certain comparison protocol for linear secret sharing scheme-based MPC protocols (see old draft here), I believe we need to customise some preprocessing code but I have not been able to find in the current documentation how to do that.
For example, it would be very useful to know how to produce in preprocessing triples ([a],[b],[ab]) where [a] and [b] are both bits in the arithmetic domain.
Judging by this line, it seems that I would need to create a new function in
Fake-Offline.cpp
to generate such triples. But I suspect I also need to make the compiler understand this new "data type"/"instruction"; any suggestion where to make the required edits?Thank you for your help!