Reveal_to is very slow - Githubissues

siberianhuskies commented 2 years ago

Hi, I'm not sure if I'm doing or understanding something wrong, but it seems like reveal_to(id) is very slow compared to reveal().

If have the following simple example:

parallelization = 3
state_length = 3
N = regint(3)

##########################
# Get input from parties #
##########################
start_timer(1)

state = Array(state_length, sint)

@for_range_parallel(parallelization, N)
def f(i):
  state[i] = sint.get_input_from(i)

stop_timer(1)

#################################
# Get hiding value from parties #
#################################

start_timer(2)

hider = Array(state_length, sint)

@for_range_parallel(parallelization, N)
def f(i):
  hider[i] = sint.get_input_from(i)

stop_timer(2)

#######################
# Add hider to output #
#######################

start_timer(3)

out_hidden = Array(state_length, sint)

@for_range_parallel(parallelization, N)
def f(i):
    out_hidden[i] = state[i] + hider[i]

print_ln("ALL HIDDEN %s", out_hidden.reveal())
stop_timer(3)

#################
# Normal output #
#################

start_timer(4)

### Only way to output to parties https://github.com/data61/MP-SPDZ/issues/241
for i in range(SOMPF_n_parties):
   state[i].reveal_to(i)

# Same speed
#@for_range_parallel(parallelization, N)
#def f(i):
#   print_ln_to(i, "state %s", state[i].reveal_to(i))

stop_timer(4)

I used semi-honest Shamir's secret sharing, and the corrsponding timeing values with three parties are:

Time = 0.020408 seconds 
Time1 = 0.00545086 seconds 
Time2 = 0.000363267 seconds 
Time3 = 0.000162798 seconds 
Time4 = 0.0134766 seconds

(The timings are even worse for more parties)

The example first reads some input (1), then an additional hiding value (2), adds the hiding value to the input and reveals that result to all parties (3), and outputs a single entry of the state to each party (4).

Running only (1) and (4), or only (1), (2) and (3) yields roughly the same measured times of each Timer1-4. Executing (1)+(2)+(3) takes roughly 0.006 seconds, whereas executing only (1)+(4) takes roughly 0.012 seconds.

Obviously the output based on reveal_to(i) (Timer4) is magnitudes slower than "reading an additional value, use it to hide the state, and then output the complete state" (Timer2+Timer3, but same security). I would have assumed option (4) should be faster, than (2)+(3).

Also, even if adding out_hidden[i].reveal().raw_output() in the loop of (3) assuming maybe the writing to the file takes so much time, only increases the time marginally.

Am I doing something wrong? Or can you please explain if this is expected behavior.

mkskeller commented 2 years ago

This is because private outputs require masks that are generated in batches. The default batch size is 10000, which is a considerable cost compared to the rest. You can reduce with the -b argument to the virtual machines.

siberianhuskies commented 2 years ago

OK, thanks. Setting -b 3 yields indeed the expected behavior. Now reveal_to(i) is much faster and Timer4 takes only 0.0003 seconds.

data61 / MP-SPDZ

Reveal_to is very slow #377