Getting different results during execution of the same MPC program in the replicated secret sharing setting

Mikerah commented 6 months ago

In #472, the suggestion was to add sfix.round_nearest = True at the beginning of the higher level program. However, since then, I'm now writing a more complicated program across several files and it uses the rep4 secret-sharing scheme. I have tried adding sfix.round_nearest = True to all of the files and tried it in only the file that's being run that imports the other files and the results are still inconsistent between runs of the same program.

What am I missing?

mkskeller commented 6 months ago

Do you mean the results are slightly off or wildly off? In the latter case, it could be a range issue similar to #1376. In the former case, please post a minimal example.

Mikerah commented 6 months ago

I've already set set_precision as specified in that issue. My program runs in the rep4 setting where R is of size 107-bits

mkskeller commented 6 months ago

So are the results slightly off (like 0.001 vs 0.002) or wildly off (1 vs 1e6)?

Mikerah commented 6 months ago

Wildly off. For example: Profit: [[7.77986e+06]] vs [[8.27596e+06]]

mkskeller commented 6 months ago

In that case, I would keep increasing the precision. For example, just double both values in set_precision(). #476 refers to slight errors only.

Mikerah commented 6 months ago

If I increase the precision, wouldn't I also need to increase the size of the ring?

mkskeller commented 6 months ago

Yes, that's correct.

Mikerah commented 6 months ago

I forgot how to calculate the size of the ring based on k and f. Essentially, I need to ensure that 2*k is at most the size of the ring but I still get the following error when trying to recompile my circuit with f and k doubled:

  File "/home/mikerah/Documents/HashCloak/Projects/mpc-cfmm-solver/MP-SPDZ/Compiler/types.py", line 4855, in coerce
    assert res.k == self.k
           ^^^^^^^^^^^^^^^
AssertionError

mkskeller commented 6 months ago

That error might stem from the fact that cfix doesn't have the same precision as sfix.

Mikerah commented 6 months ago

They are the same. I made k=80 and f=16 for both cfix.set_precision and sfix.set_precision

mkskeller commented 6 months ago

Can you post the full strack trace then?

Mikerah commented 6 months ago

When running the command ./compile.py -l -R 107 ../arbitrage.mpc, I get the following stack trace:

Default bit length for compilation: 106
Default security parameter for compilation: 40
Compiling file ../arbitrage.mpc
Traceback (most recent call last):
  File "/home/mikerah/Documents/HashCloak/Projects/mpc-cfmm-solver/MP-SPDZ/./compile.py", line 41, in <module>
    main(compiler)
  File "/home/mikerah/Documents/HashCloak/Projects/mpc-cfmm-solver/MP-SPDZ/./compile.py", line 36, in main
    compilation(compiler)
  File "/home/mikerah/Documents/HashCloak/Projects/mpc-cfmm-solver/MP-SPDZ/./compile.py", line 19, in compilation
    prog = compiler.compile_file()
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mikerah/Documents/HashCloak/Projects/mpc-cfmm-solver/MP-SPDZ/Compiler/compilerLib.py", line 452, in compile_file
    exec(compile(infile.read(), infile.name, "exec"), self.VARS)
  File "../arbitrage.mpc", line 49, in <module>
    router.route(l_bfgs_b_opt, price_vector=price_vector)
  File "<string>", line 116, in route
  File "<string>", line 103, in optimize
  File "/home/mikerah/Documents/HashCloak/Projects/mpc-cfmm-solver/MP-SPDZ/Compiler/library.py", line 843, in decorator
    @while_do(lambda x: x + n_opt_loops_reg <= n_loops, regint(0))
     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mikerah/Documents/HashCloak/Projects/mpc-cfmm-solver/MP-SPDZ/Compiler/library.py", line 1285, in decorator
    while_loop(loop_body, condition, *args)
  File "/home/mikerah/Documents/HashCloak/Projects/mpc-cfmm-solver/MP-SPDZ/Compiler/library.py", line 1265, in while_loop
    if_statement(pre_condition, lambda: do_while(loop_fn, g=g))
  File "/home/mikerah/Documents/HashCloak/Projects/mpc-cfmm-solver/MP-SPDZ/Compiler/library.py", line 1406, in if_statement
    if_fn()
  File "/home/mikerah/Documents/HashCloak/Projects/mpc-cfmm-solver/MP-SPDZ/Compiler/library.py", line 1265, in <lambda>
    if_statement(pre_condition, lambda: do_while(loop_fn, g=g))
                                        ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mikerah/Documents/HashCloak/Projects/mpc-cfmm-solver/MP-SPDZ/Compiler/library.py", line 1326, in do_while
    condition = _run_and_link(loop_fn, g)
                ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mikerah/Documents/HashCloak/Projects/mpc-cfmm-solver/MP-SPDZ/Compiler/library.py", line 1293, in _run_and_link
    res = function()
          ^^^^^^^^^^
  File "/home/mikerah/Documents/HashCloak/Projects/mpc-cfmm-solver/MP-SPDZ/Compiler/library.py", line 1259, in loop_fn
    result = loop_body(arg)
             ^^^^^^^^^^^^^^
  File "/home/mikerah/Documents/HashCloak/Projects/mpc-cfmm-solver/MP-SPDZ/Compiler/library.py", line 854, in _
    state = reducer(tuplify(loop_body(j)), state)
                            ^^^^^^^^^^^^
  File "<string>", line 105, in _
  File "/home/mikerah/Documents/HashCloak/Projects/mpc-cfmm-solver/MP-SPDZ/Compiler/types.py", line 141, in vectorized_operation
    res = operation(self, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mikerah/Documents/HashCloak/Projects/mpc-cfmm-solver/MP-SPDZ/Compiler/types.py", line 4465, in __gt__
    other = self.coerce(other)
            ^^^^^^^^^^^^^^^^^^
  File "/home/mikerah/Documents/HashCloak/Projects/mpc-cfmm-solver/MP-SPDZ/Compiler/types.py", line 220, in read_mem_operation
    return operation(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/mikerah/Documents/HashCloak/Projects/mpc-cfmm-solver/MP-SPDZ/Compiler/types.py", line 4855, in coerce
    assert res.k == self.k
           ^^^^^^^^^^^^^^^
AssertionError

Mikerah commented 6 months ago

What's interesting is that when I set f=16 and k=40, it compiles. When, I change either f or k, I get the above error.

Mikerah commented 6 months ago

I found the issue. I forgot to comment out the equivalent calls to set_precision in optimizers.mpc. I instead just set both invocations to f=32 and k=80 with a ring size of 195 and I still get inconsistent results. Would the advice here be to be on increasing the ring size and precision until I get the same results across executions?

mkskeller commented 6 months ago

If the results are still wildly off, that's one way. Another way would be to check intermediate values to narrow down where the divergence happens.

Mikerah commented 6 months ago

When checking the intermediate values, assuming there's no direct bugs, what would you suggest doing? Still increase the range? I've done that up to a ring size of 1024 and the process on my machine gets killed.

In general, are there no techniques in secure computation to allow for such precision? As MPC is used quite a bit for ML applications, I would expect this to be a common problem.

mkskeller commented 6 months ago

When checking the intermediate values, assuming there's no direct bugs, what would you suggest doing? Still increase the range? I've done that up to a ring size of 1024 and the process on my machine gets killed.

Checking intermediate values would give insight as to what range is actually required and where the computation would benefit from adjusting.

In general, are there no techniques in secure computation to allow for such precision? As MPC is used quite a bit for ML applications, I would expect this to be a common problem.

Which precision do you mean? I've found that f=16,k=31 is enough even for more complex deep learning tasks, see https://eprint.iacr.org/2022/933.

Mikerah commented 6 months ago

After a few hours, I've found something interesting about my code in relation to the outputs in various functions. If I comment out L_BFGS_BOptimizer.optimize, then the code gives consistent results when the precision is f=16 and k=31. However, once I uncomment optimize, then that's when the issues start and it affects the output in the rest of the code. I've narrowed it down to the main update (starts here of optimize. Starting from there, the output results start to be wildly different between iterations and I have yet to narrow down the range. But at a glance the numbers honestly, don't seem large so I'm still at a lost of what good ranges would be. Here are some example executions:

Using statistical security parameter 40
Trying to run 64-bit computation
[[4329.74, 27988.8], [2368.63, -8039.56]]
[[-24260.1, 19522.2], [-2979.14, 14257.2]]
[[27299.5, -2031.32], [-20469.7, -12252.4]]
[[1, 0], [0, 1]]
[[0, 0], [0, 0]]
[[-2500.35, -16107.2], [9159.24, -23711.9]]
[[1, 0], [0, 1]]
[[0, 0], [0, 0]]
[[6070.46, 9989.41], [9814.18, -25067.3]]
[[-3938.77, 18219.1], [2080.95, -3109.64]]
[[20750.5, 2141.4], [-2066.79, -4040.86]]
[[6801.88, 28233.2], [2533.15, 5320.74]]
[[1, 0], [0, 1]]
[[0, 0], [0, 0]]
[[-21421.6, 1743.23], [-8654.83, 20033.5]]
[[15137.8, 15948.7], [11996.4, -20013]]
[[-4025.26, -31703.6], [-5787.8, 32412.2]]
[[26404.4, 20888.7], [-15392.8, 16292.8]]
[[1, 0], [0, 1]]
[[0, 0], [0, 0]]
[[21671.4, -18241.4], [-4878.67, -6858.27]]
[[1, 0], [0, 1]]
[[0, 0], [0, 0]]
[[23246.7, 18162.2], [-18514.2, 16334.4]]
[[1, 0], [0, 1]]
[[0, 0], [0, 0]]
[[-439.036, 10515], [-25915.5, -1255.66]]
[[1, 0], [0, 1]]
[[0, 0], [0, 0]]
[[5576.91, 917.894], [-11530.8, -12547.1]]
price_vector: [16639.9, 23050.8]
[[0], [-47.6217]]
Profit: [[16395.2]]

Using statistical security parameter 40
Trying to run 64-bit computation
[[7599.38, -8376.31], [3599.4, 5357.3]]
[[-11271, -18312.2], [-1825.87, -5235.26]]
[[-10036.5, -13284.7], [2538, 5691.42]]
[[1, 0], [0, 1]]
[[0, 0], [0, 0]]
[[32.6907, -19721.1], [-13343.7, 12263.7]]
[[1, 0], [0, 1]]
[[0, 0], [0, 0]]
[[-10055, 16798.9], [6695.91, 11847.5]]
[[-20698.4, -6082.78], [-87.027, -13027.9]]
[[-4379.98, 17108.3], [-11178.3, -8606.01]]
[[15646.8, -23821.2], [-764.083, 23224.5]]
[[1, 0], [0, 1]]
[[0, 0], [0, 0]]
[[-13766.4, 10561.4], [2044.33, -1618.03]]
[[3665, -19380.1], [-17975.8, 11016]]
[[-25190.9, 9750.92], [9003.74, -11092.8]]
[[31032.2, 7135.23], [12426.8, 1077.96]]
[[1, 0], [0, 1]]
[[0, 0], [0, 0]]
[[-10066.5, 5714.1], [10654.4, 6935.87]]
[[1, 0], [0, 1]]
[[0, 0], [0, 0]]
[[23425.3, -13923], [-11277.8, -1578.73]]
[[1, 0], [0, 1]]
[[0, 0], [0, 0]]
[[93.389, 11499.6], [27688, -5348.07]]
[[1, 0], [0, 1]]
[[0, 0], [0, 0]]
[[19626.5, -19096.7], [26417.5, 13963.7]]
price_vector: [16639.9, 23050.8]
[[0], [-47.6217]]
Profit: [[16395.2]]

mkskeller commented 6 months ago

There is a difference already in the very first value, so you need more outputs. The idea is that you can see at which point the outputs change, so you can whittle the issue down to a particular point in the calculation.

Mikerah commented 6 months ago

Another thing that I'm noticing as well is that when I increase f and k, the values increase in size which is not behavior I'm expecting. Is that normal?

mkskeller commented 6 months ago

This is at the core of the meaning of f and k. As said in the documentation, the maximum number representable is 2**(k-f-1), so for the default k=31 and f=16 that would be 16,384. Thus, I would expect all numbers to be within [-16384,16384]. It's exactly the fact that the numbers are all over the place within the range that makes me think that the range is too small for some intermediate values.

Mikerah commented 6 months ago

I've spent the weekend trying to figure out why increasing the range doesn't seem to be working for my program. In particular, I've read a few papers that implement SGD and Adaptive SGD in an MPC setting and those typically have f=16 or 32 and k=32 or 128. I have tried these values but it appears that the ranges in my program are still unstable.

I'm currently trying to grok the SGD implementation in MP-SPDZ to see if I can attempt to do truncation of intermediate values of the matrix multiplications that are done in my program. Do you think this approach would work?

mkskeller commented 6 months ago

What do you mean by "grok"?

Mikerah commented 6 months ago

Grok is another word for "trying to understand"

mkskeller commented 6 months ago

I see. That might help but I don't think that SGD in MP-SPDZ deviates that much from a straightforward implementation, so I'm not sure if you're going to find any special tricks.

Mikerah commented 6 months ago

I was thinking it might help because the SGD implementation has a similar structure to my BFGS implementation. Mainly, in that, both algorithms are mainly iterative. During the weekend, I've discovered that the instability of values starts occurring when I increase the number of iterations in the main for loop in optimize. Particularly, here (there's an optional parameter called n_iter that is set to 10).

I'm looking to try using reduce_after_mul in order to fix this but if I understand the sfix type implementation correctly, reduce_after_mul is already used?

mkskeller commented 6 months ago

Yes, sfix operations usually take care of truncation. Increasing iterations causing problems might indicate divergence of the training.

Mikerah commented 6 months ago

By divergence of the training, do you mean that increasing the number of iterations indicates that the algorithm is not converging to a minimum?

mkskeller commented 6 months ago

Yes

Mikerah commented 6 months ago

Closing since I have enough evidence to see that the training is indeed diverging. Thank you

data61 / MP-SPDZ

Getting different results during execution of the same MPC program in the replicated secret sharing setting #1389