data61 / MP-SPDZ

Versatile framework for multi-party computation
Other
899 stars 278 forks source link

Getting started with low-level interface? #441

Closed Isweet closed 2 years ago

Isweet commented 2 years ago

I'm interested in wrapping the low-level C++ interface in C so that it can be used via FFI. I'm interested in shamir, malicious shamir, semi2k, spdz2k (or maybe semi and mascot, not sure yet). So, nothing requiring FHE or garbled circuits.

Any pointers for getting started?

In particular, I'm wondering about...

  1. Dependencies. Which do I need for each protocol (in addition to libSPDZ)? I notice from the README that tldr.sh will automatically install dependencies using the local system package manager. Is there a script which only installs dependencies, or which can install dependencies based on protocols of interest?
  2. Additional examples of using the C++ API. Is there any other code you'd recommend I look at in addition to Utils/paper-example.cpp? I understand all the pieces in paper-example.cpp but I'm not quite sure which pieces don't need to be included for semihonest (for example, I imagine anything having to do with MACs goes away), and I'm not sure how to use a boolean protocol (mod 2^k, rather than over prime or GF).
  3. Building. If I don't care about the compiler, VM, FHE, garbled circuits, ... what are the things I need to include in my build? Just link against libSPDZ and a hardware optimized version of OT?

Thanks in advance for your time. MP-SPDZ is a fantastic piece of software and a great accomplishment, I'm looking forward to using it.

mkskeller commented 2 years ago

I'm interested in wrapping the low-level C++ interface in C so that it can be used via FFI. I'm interested in shamir, malicious shamir, semi2k, spdz2k (or maybe semi and mascot, not sure yet). So, nothing requiring FHE or garbled circuits.

Any pointers for getting started?

The following section in the documentation explains all arithmetic share types and the protocol interfaces: https://mp-spdz.readthedocs.io/en/latest/low-level.html

The Names class used for the network setup is documented here: https://mp-spdz.readthedocs.io/en/latest/networking.html#_CPPv45Names

In particular, I'm wondering about...

1. Dependencies. Which do I need for each protocol (in addition to `libSPDZ`)? I notice from the README that `tldr.sh` will automatically install dependencies using the local system package manager. Is there a script which only installs dependencies, or which can install dependencies based on protocols of interest?

tldr.sh only installs dependencies on macOS. On Linux, tldr.sh uses statically linked binaries that don't require any dependencies. There is no difference in dependencies between protocols because the OT stuff is linked even if not used.

2. Additional examples of using the C++ API. Is there any other code you'd recommend I look at in addition to `Utils/paper-example.cpp`? I understand all the pieces in `paper-example.cpp` but I'm not quite sure which pieces don't need to be included for semihonest (for example, I imagine anything having to do with MACs goes away), and I'm not sure how to use a boolean protocol (mod 2^k, rather than over prime or GF).

There aren't any additional examples of the C++ API within MP-SPDZ as the main use is within the VM, and I'm not aware of any other open source code using it. You're correct that the MAC generation isn't needed for protocol that don't a MAC, and fields don't need to be initialized for computation mod 2^k. However, calling any of this functionality simply leads to empty functions, which is how the VM works with a shared code base between protocols.

3. Building. If I don't care about the compiler, VM, FHE, garbled circuits, ... what are the things I need to include in my build? Just link against `libSPDZ` and a hardware optimized version of OT?

Your best guide is Makefile. For example semi-party.x requires GC/SemiPrep.o and GC/SemiSecret.o in addition to OT/*.o and the common code: https://github.com/data61/MP-SPDZ/blob/cdb0c0f898f0c79b70d0b101872baeb80bd70ba2/Makefile#L208

Isweet commented 2 years ago

Maybe you can suggest where I'm going wrong. Here's my attempt to modify paper-example.cpp to use semihonest 2^k instead.

/*
 * basic.cpp
 *
 * Example of using low-level interface for dot product over semihonest 2^k
 *
 */

#define NO_MIXED_CIRCUITS

#include "Protocols/Semi2kShare.h"
#include "Protocols/SemiPrep2k.h"

int main(int argc, char** argv)
{
    // need player number and number of players
    if (argc < 3)
    {
        cerr << "Usage: " << argv[0] << "<my number: 0/1/...> <total number of players>" << endl;
        exit(1);
    }

    // set up networking on localhost
    int my_number = atoi(argv[1]);
    int n_parties = atoi(argv[2]);
    int port_base = 9999;
    Names N(my_number, n_parties, "localhost", port_base);
    PlainPlayer P(N);

    // must initialize MAC key for security of some protocols
    Semi2kShare<64>::mac_key_type mac_key;
    Semi2kShare<64>::read_or_generate_mac_key("", P, mac_key);

    // global OT setup
    BaseMachine machine;
    if (Semi2kShare<64>::needs_ot)
        machine.ot_setups.push_back({P});

    // keeps tracks of preprocessing usage (triples etc)
    DataPositions usage;
    usage.set_num_players(P.num_players());

    // output protocol
    Semi2kShare<64>::MAC_Check output(mac_key);

    // various preprocessing
    Semi2kShare<64>::LivePrep preprocessing(0, usage);
    SubProcessor<Semi2kShare<64>> processor(output, preprocessing, P);

    // input protocol
    Semi2kShare<64>::Input input(processor, output);

    // multiplication protocol
    Semi2kShare<64>::Protocol protocol(P);

    int n = 1000;
    vector<Semi2kShare<64>> a(n), b(n);
    Semi2kShare<64> c;
    Semi2kShare<64>::clear result;

    input.reset_all(P);
    for (int i = 0; i < n; i++)
        input.add_from_all(i);
    input.exchange();
    for (int i = 0; i < n; i++)
    {
        a[i] = input.finalize(0);
        b[i] = input.finalize(1);
    }

    protocol.init_dotprod(&processor);
    for (int i = 0; i < n; i++)
        protocol.prepare_dotprod(a[i], b[i]);
    protocol.next_dotprod();
    protocol.exchange();
    c = protocol.finalize_dotprod(n);
    output.init_open(P);
    output.prepare_open(c);
    output.exchange(P);
    result = output.finalize_open();

    cout << "result: " << result << endl;
    output.Check(P);

    Semi2kShare<64>::LivePrep::teardown();

}

and I'm building using make basic.x with the Makefile modified as...

basic.x: $(OT) GC/SemiSecret.o GC/SemiPrep.o GC/square64.o

I'm getting an error with linking:

Undefined symbols for architecture x86_64:
  "RepRingOnlyEdabitPrep<Semi2kShare<64> >::buffer_edabits(int, ThreadQueues*)", referenced from:
      SemiPrep2k<Semi2kShare<64> >::buffer_edabits(int, ThreadQueues*) in basic.o
      non-virtual thunk to SemiPrep2k<Semi2kShare<64> >::buffer_edabits(int, ThreadQueues*) in basic.o
      virtual thunk to SemiPrep2k<Semi2kShare<64> >::buffer_edabits(int, ThreadQueues*) in basic.o
      construction vtable for RepRingOnlyEdabitPrep<Semi2kShare<64> >-in-SemiPrep2k<Semi2kShare<64> > in basic.o
  "SemiPrep<Semi2kShare<64> >::buffer_triples()", referenced from:
      vtable for SemiPrep2k<Semi2kShare<64> > in basic.o
      construction vtable for SemiPrep<Semi2kShare<64> >-in-SemiPrep2k<Semi2kShare<64> > in basic.o
  "SemiPrep<Semi2kShare<64> >::SemiPrep(SubProcessor<Semi2kShare<64> >*, DataPositions&)", referenced from:
      SemiPrep2k<Semi2kShare<64> >::SemiPrep2k(SubProcessor<Semi2kShare<64> >*, DataPositions&) in basic.o
  "virtual thunk to SemiPrep<Semi2kShare<64> >::buffer_triples()", referenced from:
      vtable for SemiPrep2k<Semi2kShare<64> > in basic.o
      construction vtable for SemiPrep<Semi2kShare<64> >-in-SemiPrep2k<Semi2kShare<64> > in basic.o
  "virtual thunk to RepRingOnlyEdabitPrep<Semi2kShare<64> >::buffer_edabits(int, ThreadQueues*)", referenced from:
      construction vtable for RepRingOnlyEdabitPrep<Semi2kShare<64> >-in-SemiPrep2k<Semi2kShare<64> > in basic.o
  "_receiver_keygen", referenced from:
      BaseOT::exec_base(bool) in BaseOT.o
  "_receiver_maketable", referenced from:
      BaseOT::exec_base(bool) in BaseOT.o
  "_receiver_procS", referenced from:
      BaseOT::exec_base(bool) in BaseOT.o
  "_receiver_rsgen", referenced from:
      BaseOT::exec_base(bool) in BaseOT.o
  "_sender_genS", referenced from:
      BaseOT::exec_base(bool) in BaseOT.o
  "_sender_keygen", referenced from:
      BaseOT::exec_base(bool) in BaseOT.o

I believe it has something to do with the SubProcess creation, which requires additional dependencies related to "replicated rings" but I'm not sure what to include in the Makefile dependencies so that these symbols are properly linked.

Isweet commented 2 years ago

Perhaps if I specialize more to the case of semihonest 2^k, I won't need to create a subprocess object and therefore won't need to define the symbols? This seems plausible, since the documentation says the subprocess object is necessary for linking the beaver triples to the output object (presumably this is only necessary for malicious protocols that need to do a MAC check prior to opening shares).

EDIT: On second thought, the missing symbols seem to be related to the public key operations required for the base OTs... but I'm not sure why those are missing. I think all the OT related symbols should be linked by requiring $(OT) in the Makefile.

FWIW I'm able to build and run the semi2k-party.x binary just fine.

mkskeller commented 2 years ago

You need to add the following includes (as in semi2k-party.cpp):

#include "Machines/Semi.hpp"
#include "Protocols/RepRingOnlyEdabitPrep.hpp"

You also need to add SimpleOT/libsimpleot.a to your dependencies in the makefile.

mkskeller commented 2 years ago

That worked, thanks. Just so I understand, I guess the base OTs of semi2k-party target use software public key crypto, whereas the *-ecdsa-party targets use hardware instructions and that's where the difference in dependencies comes from?

No. Base OTs always use SimpleOT unless with AVX_OT = 0 in the configuration, which triggers using elliptic curves in OpenSSL. *-ecdsa-party.x programs aren't what crypto is used how, instead they implemented distributed ECDSA using some protocol. Hardware instructions are used according to the -march=... argument to the compiler.

Trying to understand what stuff is contained in just $(OT) and why the semi2k-party doesn't need additional symbols.

$(OT) contains all object files from the OT directory, which are largely about OT extension. What do you mean by additional symbols?

Isweet commented 2 years ago

Thanks for the clarification, I realized I was mistaken after I left the comment. I think I understand where my confusion was coming from. Is it correct that certain targets (such as semi2k-party) don't link $(LIBSIMPLEOT) because during execution they will use the VM which communicates with a separate process running ot.x?

If so, is the following summary correct? I need $(LIBSIMPLEOT) for the base OTs and then $(OT) links in the MP-SPDZ implementation of OT extension.

Where does the garbled circuit stuff come in? I notice I also need GC/SemiPrep.o GC/SemiSecret.o.

mkskeller commented 2 years ago

Thanks for the clarification, I realized I was mistaken after I left the comment. I think I understand where my confusion was coming from. Is it correct that certain targets (such as semi2k-party) don't link $(LIBSIMPLEOT) because during execution they will use the VM which communicates with a separate process running ot.x?

No, $(LIBSIMPLEOT) is always linked via $(VM)/$(MINI_OT).

If so, is the following summary correct? I need $(LIBSIMPLEOT) for the base OTs and then $(OT) links in the MP-SPDZ implementation of OT extension.

Yes, that's correct.

Where does the garbled circuit stuff come in? I notice I also need GC/SemiPrep.o GC/SemiSecret.o.

The GC directory is a misnomer as it contains all binary-specific code. Said objects contain code for semi-honest binary computation. The dependency is required as NO_MIXED_CIRCUITS isn't implemented for this protocol.

Isweet commented 2 years ago

Okay, that's helpful, thanks! I'm still having trouble seeing how the Makefile links in $(LIBSIMPLEOT) for the semi2k-party.x target but it isn't that important.

Thanks for all your quick responses, I'll continue fiddling around on my own and follow-up if I have other places where I get stuck.

Isweet commented 2 years ago

How should I go about creating a shared library for (say) MASCOT? I'd like something along the lines of #103 but as a shared and with a lot of the templating instantiated (e.g. choose some bitwidths (32, 64, 128) and provide Semi32Share, Semi64Share, Semi128Share).

I'm having a lot of difficulty figuring out how to achieve this using the current Makefile. I tried to add the following target as a first step, modifying the librelease.a target to instead produce a shared library.

$(LIBRELEASESO): Protocols/MalRepRingOptions.o $(LIBSIMPLEOT) $(PROCESSOR) $(COMMONOBJS) $(OT) $(GC)
    $(CXX) $(CFLAGS) -shared -o $@ $^ $(LDLIBS)

But for some reason I'm getting a missing symbol error associated with the Shamir options singleton.

Undefined symbols for architecture x86_64:
  "ShamirOptions::s()", referenced from:
      GC::TinyMC<GC::AtlasSecret>::TinyMC(gf2n_<unsigned char>) in AtlasSecret.o
      IndirectShamirMC<GC::AtlasShare>::exchange(Player const&) in AtlasSecret.o

This function is defined in ShamirMachine.hpp (I think), so I'm not sure why I would be missing the symbol unless there is an issue with ShamirMachine.hpp not being included somewhere it should be.

Any help you could provide would be much appreciated!

Isweet commented 2 years ago

On a closer look, I'm actually also confused about why there's Atlas stuff appearing in the build at all. My understanding is that #103 intends to package MASCOT as a library. Is there a dependency on Atlas / Shamir sharing for some reason?

I'm really just looking to package any protocol to start with, and then I'll use that as a template to package additional protocols as I need them.

mkskeller commented 2 years ago

librelease.a shouldn't include AtlasSecret.o, so I'm not sure why the error appears. In any case, you should be able to fix it by adding #include "Machines/ShamirMachine.hpp" to GC/AtlasSecret.cpp.

mkskeller commented 2 years ago

Instead of adapting librelease.a, you might want to extend libSPDZ.so/$(SHAREDLIB). It contains the core stuff, so you just need to add objects specific to a protocol. Also note that I had trouble adding SimpleOT to the shared library, so you might need to use the AVX_OT = 0 option.

Isweet commented 2 years ago

Adding #include "Machines/ShamirMachine.hpp" to GC/AtlasSecret.cpp worked for building librelease.so but now unfortunately there are duplicate symbols when building atlas-party.x. Not a huge deal, since I don't need Atlas stuff, but there's definitely something fishy going on -- seems like perhaps there's a file that needs preprocessor directives to avoid multiple includes.

I'll keep the AVX_OT = 0 in mind. I didn't have any trouble adding $(LIBSIMPLEOT) to the release shared library, but if I end up adding to libSPDZ.so and run into trouble I'll try it.

Isweet commented 2 years ago
g++ -o atlas-party.x ...
duplicate symbol 'ShamirOptions::s()' in:
    Machines/atlas-party.o
    GC/AtlasSecret.o
duplicate symbol 'ShamirOptions::ShamirOptions(int, int)' in:
    Machines/atlas-party.o
    GC/AtlasSecret.o
...
Isweet commented 2 years ago

Can you explain the difference between:

Based on the comment, I assume that protocol.check() is performing the necessary MAC checks to ensure integrity before revealing the result. What's output.Check(P) doing?

By the way, I like the changes you made in v.0.2.9 -- the ProtocolSetup and ProtocolSet make using the low-level interface much easier.

mkskeller commented 2 years ago

In SPDZ(2k) it's the same, but some protocols have different check for outputs and multiplications such as Rep4, post-sacrifice, and SPDZ-wise.