Closed enricobottazzi closed 1 month ago
Hi @enricobottazzi, That's a really interesting observation! While this might not be directly related to your original question, I noticed that you have both online and offline time reported. I'm wondering if there's a way to benchmark the performance of both phases simultaneously?
@aaallami yes, you can do that by attaching a flag -v
as a runtime argument
OPTIONS:
-b, --batch-size ARG Size of preprocessing batches (default: 10000)
-B, --bucket-size ARG Batch size for sacrifice (3-5, default: 4)
-d, --direct Direct communication instead of star-shaped
(only for dishonest-majority protocols)
-D, --disk-memory ARG Use directory on disk for memory (container
data structures) instead of RAM
-ext-server, --external-server Use external server. Default is to coordinate
through player 0.
-E, --trunc-error ARG Probabilistic truncation error (2^-x, default:
40)
-f, --file-prep-per-thread Preprocessing from files by thread (use with
pipes)
-F, --file-preprocessing Preprocessing from files
-h, --hostname ARG Host where Server.x or party 0 is running to
coordinate startup (default: localhost).
Ignored if --ip-file-name is used.
-ip, --ip-file-name ARG Filename containing list of party ip addresses.
Alternative to --hostname and running Server.x
for startup coordination.
-I, --interactive Interactive mode in the main thread (default:
disabled)
-IF, --input-file ARG Prefix for input file path (default:
Player-Data/Input). Text input will be read
from {prefix}-P{id}-{thread_id} and binary
input from {prefix}-Binary-P{id}-{thread_id}
-lg2, --lg2 ARG Bit length of GF(2^n) field (default: 128;
options are 4, 5, 6, 7, 8, 9, 10, 11, 12, 14,
15, 16, 28, 40, 63, 64, 128)
-m, --memory ARG Where to obtain memory, old|empty (default:
empty)
old: reuse previous memory in
Memory-<type>-P<i>
empty: create new empty memory
-mp, --my-port ARG Port to listen on (default: port number base +
player number)
-o, --options ARG1[,ARGn] Further options
-OF, --output-file ARG Prefix for output file path (default: output to
stdout for party 0 (silent otherwise unless
interactive mode is active). Output will be
written to {prefix}-P{id}-{thread_id}. Use '.'
for stdout on all parties.
-p, --player ARG This player's number (required if not given
before program name)
-pn, --portnumbase ARG Port number base to attempt to start
connections from (default: 5000)
-Q, --bits-from-squares Compute random bits from squares
-R, --ring ARG Number of integer bits (default: 64)
-S, --security ARG Statistical ecurity parameter (default: 40)
-u, --unencrypted Unencrypted communication.
-v, --verbose Verbose output, in particular more data on
communication
This is fantastic, Thank you so much for your response
Regarding this original question: The running time of your benchmarks is so low that the noise might be as big as the signal, i.e., the differences from time to time are as large as the overall time. Do you also observe this in longer benchmarks?
Hi Marcel, yes, using larger values of n
I still observe a slowdown in run time when using @for_range_opt
or @for_range
compared to using traditional python for loop but this comes with much smaller compilation size.
Is it an expected behaviour? Or is it possible to get the best of both worlds?
Here a more detailed overview of the compared performance in terms of runtime and compiled circuit size
Have you tried adjusting the optimization with for_range_opt
(either via --budget
or the argument to for_range_opt
)? The default is 1000, but 100,000 usually still leads to an acceptable compilation time. It allows to adjust the trade-off in the optimization. I would also appreciate a stand-alone example for further analysis.
Hi Marcel,
As part of my circuit, I have this subroutine that contains three nested loops (
n
is set to 100). Note that every component is public here so there's really no communication between the computing parties.Just by isolating this component of the circuit, the compilation with flags
--edabit -Z 3 -R 64 -l
takes a lot of lines (I stopped the compilation after the warningCompilation produced more that 10 million instructions. Consider using './compile.py -l' or replacing for loops with @for_range_opt: https://mp-spdz.readthedocs.io/en/latest/Compiler.html#Compiler.library.for_range_opt
)As soon as I modify the for loops into
for_range_opt
the compilation is super fast (it doesn't even get toCompiled 100000 lines
).However, comparing the runtime of the online phase, for example setting a
n=15
I notice a slow down in the execution time:Spent 0.000638 seconds on the online phase and 3.5e-05 seconds on the preprocessing/offline phase.
Spent 0.006642 seconds on the online phase and 1.1e-05 seconds on the preprocessing/offline phase.
Do you know if there's any trick to speed up the compilation without sacrificing runtime?
Thanks