Visa-Research / volepsi

Efficient Private Set Intersection base on VOLE
MIT License
98 stars 31 forks source link

Benchmark Refs #8

Closed stone-wlg closed 1 year ago

stone-wlg commented 1 year ago

Env

Without Silver

Lines(r/s) Size(r/s) Load(r/s) PSI Output Total
1M 31MB 473ms 3016ms 342ms 3831ms (~4s)
10M 315MB 4607ms 30408ms 2314ms 37329ms (~40s)
100M 3.1GB 49423ms 654821ms 27533ms 731777ms (~800s)
1M/10M 31MB/315MB 443ms/4720ms 5328ms 168ms 10216ms (~10s)
10M/1M 315MB/32MB 4636ms/470ms 27978ms 862ms 33476ms (~40s)
10M/100M 315MB/3.1GB 4607ms/51200ms 64723ms 1973ms 117896ms (~120s)
100M/10M 3.1GB/315MB 46850ms/4670ms 603774ms 11285ms 661909ms (~700s)

With Silver

Lines(r/s) Size(r/s) Load(r/s) PSI Output Total
10M 315MB 4711ms 8409ms 1812ms 14932ms (~15s)
100M 3.1GB 49704ms 154925ms 40401ms 245030ms (~250s)
10M/100M 315MB/3.1GB 4754ms/49035ms 39963ms 2386ms 91384ms (~100s)
100M/10M 3.1GB/315MB 46862ms/4786ms 105070ms 10750ms 162682 (~170s)
ladnir commented 1 year ago

Is there a question here? Also you have to use silver to get the timings in the paper.

ladnir commented 1 year ago

And there is benchmarkijg support which doesn't go through the file based stuff... If you are just after perf.

stone-wlg commented 1 year ago

Is there a question here? Also you have to use silver to get the timings in the paper.

it is not a question, just as a refs. i have no idea where could i remind this, so i put here.

ladnir commented 1 year ago

K, well your reference appears quite slow so not sure of good of a reference it is...

stone-wlg commented 1 year ago

K, well your reference appears quite slow so not sure of good of a reference it is...

Do you have standard benchmark refs ?

ladnir commented 1 year ago

Well there's the paper to start with and those numbers were obtains via the benchmark in the frontend/perf.cpp file. If you run frontend with no arguments you can see instructions on how to run it. Something like frontend -perf -psi.

ladnir commented 1 year ago

Also, you have to add -useSilver to use the Silver lpn encoder (faster but has less conservative security guarantees).

stone-wlg commented 1 year ago

Also, you have to add -useSilver to use the Silver lpn encoder (faster but has less conservative security guarantees).

ok, but it is still experimental.

stone-wlg commented 1 year ago

Well there's the paper to start with and those numbers were obtains via the benchmark in the frontend/perf.cpp file. If you run frontend with no arguments you can see instructions on how to run it. Something like frontend -perf -psi.

  1. Got it, i will try it.
  2. is there different with using frontend directly to test ? ex: ./frontend -in ./sender.csv -r 0 -server 1 -ip localhost:1212
  3. i did not add -useSilver, then quit slow ? yes, more faster with -useSliver option
ladnir commented 1 year ago

That's correct, silver is experimental. Overall the psi construction has provable security assuming you have a secure vole (based on lpn).

For implementing the vole you have two options. Lpn with Quasi cyclic codes or lpn with Silver codes. The former is an error correcting code with provable linear minimum distance. Given this there is well accepted assumption that lpn is secure. Although even this is still less understood than say the security of the kkrt psi protocol.

Then there is the Silver error correcting code which is conjectured to have linear minimum distance but is still considered experimental until more analysis is done.

Silver is a lot faster than quasi cyclic codes. Also, the current implementation of quasi cyclic codes is suboptimal which doesn't help ha. Also, forshadowing a bit, there's some upcoming work that should get the best of both. Stay tuned :)

The difference between the file based example and the benchmark is the networking and generally make sure that it's measuring the right thing. The benchmark communicates via main memory (basically just memcpy's the data) as opposed to also counting the network latency. That may or may not be desirable depending on that you're wanting to measure.

More generally, you timings are quite slow even considering the fact that you use quasi cyclic codes and tcp localhost networking. On my machine I think I get 7 seconds for 10M. And that's a laptop with an i7 and 8GB of ram. I'm surprised it's taking 60 seconds. Seems like something is wrong.

stone-wlg commented 1 year ago

That's correct, silver is experimental. Overall the psi construction has provable security assuming you have a secure vole (based on lpn).

For implementing the vole you have two options. Lpn with Quasi cyclic codes or lpn with Silver codes. The former is an error correcting code with provable linear minimum distance. Given this there is well accepted assumption that lpn is secure. Although even this is still less understood than say the security of the kkrt psi protocol.

Then there is the Silver error correcting code which is conjectured to have linear minimum distance but is still considered experimental until more analysis is done.

Silver is a lot faster than quasi cyclic codes. Also, the current implementation of quasi cyclic codes is suboptimal which doesn't help ha. Also, forshadowing a bit, there's some upcoming work that should get the best of both. Stay tuned :)

The difference between the file based example and the benchmark is the networking and generally make sure that it's measuring the right thing. The benchmark communicates via main memory (basically just memcpy's the data) as opposed to also counting the network latency. That may or may not be desirable depending on that you're wanting to measure.

More generally, you timings are quite slow even considering the fact that you use quasi cyclic codes and tcp localhost networking. On my machine I think I get 7 seconds for 10M. And that's a laptop with an i7 and 8GB of ram. I'm surprised it's taking 60 seconds. Seems like something is wrong.

How many threads did you use ?

stone-wlg commented 1 year ago

That's correct, silver is experimental. Overall the psi construction has provable security assuming you have a secure vole (based on lpn). For implementing the vole you have two options. Lpn with Quasi cyclic codes or lpn with Silver codes. The former is an error correcting code with provable linear minimum distance. Given this there is well accepted assumption that lpn is secure. Although even this is still less understood than say the security of the kkrt psi protocol. Then there is the Silver error correcting code which is conjectured to have linear minimum distance but is still considered experimental until more analysis is done. Silver is a lot faster than quasi cyclic codes. Also, the current implementation of quasi cyclic codes is suboptimal which doesn't help ha. Also, forshadowing a bit, there's some upcoming work that should get the best of both. Stay tuned :) The difference between the file based example and the benchmark is the networking and generally make sure that it's measuring the right thing. The benchmark communicates via main memory (basically just memcpy's the data) as opposed to also counting the network latency. That may or may not be desirable depending on that you're wanting to measure. More generally, you timings are quite slow even considering the fact that you use quasi cyclic codes and tcp localhost networking. On my machine I think I get 7 seconds for 10M. And that's a laptop with an i7 and 8GB of ram. I'm surprised it's taking 60 seconds. Seems like something is wrong.

How many threads did you use ?

ladnir commented 1 year ago

One

stone-wlg commented 1 year ago

One this is receiver log with 10M: reading set... 4685ms connecting as client at address localhost:1212 1ms Validating set sizes... 1ms running PSI... 30182ms Writing output to ./data/out-1.csv 1937ms

stone-wlg commented 1 year ago

One this is receiver log with 10M: reading set... 4685ms connecting as client at address localhost:1212 1ms Validating set sizes... 1ms running PSI... 30182ms Writing output to ./data/out-1.csv 1937ms

Maybe the CPU is different and some features do not support

stone-wlg commented 1 year ago

Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 64 On-line CPU(s) list: 0-63 Thread(s) per core: 1 Core(s) per socket: 64 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 79 Model name: Intel(R) Xeon(R) CPU E5-2683 v4 @ 2.10GHz Stepping: 1 CPU MHz: 2099.998 BogoMIPS: 4199.99 Virtualization: VT-x Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 4096K L3 cache: 16384K NUMA node0 CPU(s): 0-63 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology eagerfpu pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat spec_ctrl intel_stibp

ladnir commented 1 year ago

Yeah, seems like a nice machine...

What happens when you run the psi benchmark

ladnir commented 1 year ago

-perf -psi -n 10000000 -t 4

stone-wlg commented 1 year ago

Yeah, seems like a nice machine...

What happens when you run the psi benchmark

i make docker image, but i can not push Docker Hub now, later, i will share link with you.

stone-wlg commented 1 year ago

-perf -psi -n 10000000 -t 4

could you show full command with options? sender: xxx receiver: xxx

ladnir commented 1 year ago

sorry, that wasn't quire right.

-perf -psi -nn 23 -v -useSilver -t 10

Will run both the sender and receiver 10 times with set sizes 2^23=8,388,608 on a single thread. Both parties share a single thread so, in reality, the protocol could run faster if both parties had their own thread. Maybe I should change this... Regardless, this will print something like

nt 1 fakeBase 0 n 8388608
Label     Time (ms)  diff (ms)
__________________________________
end          5344.6   5344.570  **********
begin        5410.5     65.886  *****
end         10354.4   4943.956  **********
begin       10416.9     62.452  *****
end         15439.6   5022.710  **********
begin       15500.7     61.090  *****
end         20429.4   4928.777  **********
begin       20493.0     63.555  *****
end         25449.8   4956.773  **********
begin       25513.2     63.402  *****
end         30422.4   4909.267  **********
begin       30487.2     64.725  *****
end         35403.4   4916.187  **********
begin       35467.9     64.533  *****
end         40436.4   4968.461  **********
begin       40499.4     63.000  *****
end         45441.6   4942.252  **********
begin       45505.2     63.580  *****
end         50433.9   4928.684  **********

1828981027 923528554

The end lines report the running time of the protocol in milliseconds. So it takes ~4.9 seconds to run on my laptop.

The last numbers are the total communication of the receiver and sender in bytes for the 10 trials.

ladnir commented 1 year ago

You can also do -perf -psi -nn 23 -v 2 -useSilver -t 1 to get a detailed printout of one execution of the protocol.

stone-wlg commented 1 year ago

sorry, that wasn't quire right.

-perf -psi -nn 23 -v -useSilver -t 10

Will run both the sender and receiver 10 times with set sizes 2^23=8,388,608 on a single thread. Both parties share a single thread so, in reality, the protocol could run faster if both parties had their own thread. Maybe I should change this... Regardless, this will print something like

nt 1 fakeBase 0 n 8388608
Label     Time (ms)  diff (ms)
__________________________________
end          5344.6   5344.570  **********
begin        5410.5     65.886  *****
end         10354.4   4943.956  **********
begin       10416.9     62.452  *****
end         15439.6   5022.710  **********
begin       15500.7     61.090  *****
end         20429.4   4928.777  **********
begin       20493.0     63.555  *****
end         25449.8   4956.773  **********
begin       25513.2     63.402  *****
end         30422.4   4909.267  **********
begin       30487.2     64.725  *****
end         35403.4   4916.187  **********
begin       35467.9     64.533  *****
end         40436.4   4968.461  **********
begin       40499.4     63.000  *****
end         45441.6   4942.252  **********
begin       45505.2     63.580  *****
end         50433.9   4928.684  **********

1828981027 923528554

The end lines report the running time of the protocol in milliseconds. So it takes ~4.9 seconds to run on my laptop.

The last numbers are the total communication of the receiver and sender in bytes for the 10 trials.

I can not run it, i think it is my env problem. root@87187a107c19:/opt/vole-psi# ./frontend -perf -psi -nn 23 -v -useSilver -t 10 nt 1 fakeBase 0 n 8388608 terminate called after throwing an instance of 'std::system_error' what(): coproto::code::remoteClosed: The connection has been closed by the remote party. Aborted (core dumped)

stone-wlg commented 1 year ago

sorry, that wasn't quire right.

-perf -psi -nn 23 -v -useSilver -t 10

Will run both the sender and receiver 10 times with set sizes 2^23=8,388,608 on a single thread. Both parties share a single thread so, in reality, the protocol could run faster if both parties had their own thread. Maybe I should change this... Regardless, this will print something like

nt 1 fakeBase 0 n 8388608
Label     Time (ms)  diff (ms)
__________________________________
end          5344.6   5344.570  **********
begin        5410.5     65.886  *****
end         10354.4   4943.956  **********
begin       10416.9     62.452  *****
end         15439.6   5022.710  **********
begin       15500.7     61.090  *****
end         20429.4   4928.777  **********
begin       20493.0     63.555  *****
end         25449.8   4956.773  **********
begin       25513.2     63.402  *****
end         30422.4   4909.267  **********
begin       30487.2     64.725  *****
end         35403.4   4916.187  **********
begin       35467.9     64.533  *****
end         40436.4   4968.461  **********
begin       40499.4     63.000  *****
end         45441.6   4942.252  **********
begin       45505.2     63.580  *****
end         50433.9   4928.684  **********

1828981027 923528554

The end lines report the running time of the protocol in milliseconds. So it takes ~4.9 seconds to run on my laptop.

The last numbers are the total communication of the receiver and sender in bytes for the 10 trials.

could you try it on your laptop? $ ./frontend -in ./sender.csv -r 0 -server 1 -ip localhost:1212 $ ./frontend -in ./receiver.csv -r 1 -server 0 -ip localhost:1212 -out ./out.csv

and you can run it to make dummy data

!/usr/bin/env python3

filename = 'data.csv' record_count = 10000000 start_index = 1000000000 end_index = start_index + record_count

with open(filename, 'w+') as file: for i in range(start_index, end_index): line = f'{str(i).rjust(32, "x")}' file.write(f'{line}\n')

stone-wlg commented 1 year ago

sorry, that wasn't quire right.

-perf -psi -nn 23 -v -useSilver -t 10

Will run both the sender and receiver 10 times with set sizes 2^23=8,388,608 on a single thread. Both parties share a single thread so, in reality, the protocol could run faster if both parties had their own thread. Maybe I should change this... Regardless, this will print something like

nt 1 fakeBase 0 n 8388608
Label     Time (ms)  diff (ms)
__________________________________
end          5344.6   5344.570  **********
begin        5410.5     65.886  *****
end         10354.4   4943.956  **********
begin       10416.9     62.452  *****
end         15439.6   5022.710  **********
begin       15500.7     61.090  *****
end         20429.4   4928.777  **********
begin       20493.0     63.555  *****
end         25449.8   4956.773  **********
begin       25513.2     63.402  *****
end         30422.4   4909.267  **********
begin       30487.2     64.725  *****
end         35403.4   4916.187  **********
begin       35467.9     64.533  *****
end         40436.4   4968.461  **********
begin       40499.4     63.000  *****
end         45441.6   4942.252  **********
begin       45505.2     63.580  *****
end         50433.9   4928.684  **********

1828981027 923528554

The end lines report the running time of the protocol in milliseconds. So it takes ~4.9 seconds to run on my laptop.

The last numbers are the total communication of the receiver and sender in bytes for the 10 trials.

  1. it is "pure" computing cost, coz only 1 thread
  2. actually the costs should contains several parts, ex: system calls, network tcp ...
  3. could you benchmark with 2 processes, not only threads level.
stone-wlg commented 1 year ago

oh, i can only run with -t 1. root@87187a107c19:/opt/vole-psi# ./frontend -perf -psi -nn 23 -useSilver -t 1 -v
nt 1 fakeBase 0 n 8388608 Label Time (ms) diff (ms)


end 9790.4 9790.381 **

182906789 92363805