Huelse / SEAL-Python

Microsoft SEAL 4.X For Python
MIT License
310 stars 66 forks source link

Cannot deep copy certain seal objects #28

Closed DreamingRaven closed 4 years ago

DreamingRaven commented 4 years ago

Hey there Huelse, I am having some difficulty deep copying some objects of seal-python giving me:

Traceback (most recent call last):
  File "/python-fhe/fhe/fhe.py", line 123, in _merge_dictionary
    copy.deepcopy(d)
  File "/usr/lib/python3.8/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/usr/lib/python3.8/copy.py", line 230, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/lib/python3.8/copy.py", line 161, in deepcopy
    rv = reductor(4)
TypeError: cannot pickle 'seal.SEALContext' object

or

Traceback (most recent call last):
  File "/python-fhe/fhe/fhe.py", line 124, in _merge_dictionary
    dicts = copy.deepcopy(dicts)
  File "/usr/lib/python3.8/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/usr/lib/python3.8/copy.py", line 210, in _deepcopy_tuple
    y = [deepcopy(a, memo) for a in x]
  File "/usr/lib/python3.8/copy.py", line 210, in <listcomp>
    y = [deepcopy(a, memo) for a in x]
  File "/usr/lib/python3.8/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/usr/lib/python3.8/copy.py", line 230, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/usr/lib/python3.8/copy.py", line 161, in deepcopy
    rv = reductor(4)
TypeError: cannot pickle 'seal.PublicKey' object

Its interesting that deep copying results in a pickle of these objects.

Not sure if you know of a workaround, or anything I can help with in particular to make it possible to deep copy.

Thanks as usual.

DreamingRaven commented 4 years ago

is this at all related to https://github.com/Huelse/SEAL-Python/pull/22 do you think?

Huelse commented 4 years ago

Sorry, I'm busy recently. It didn't work well as we think before, so I haven't merged it. We talked it in #19 .After the SEAL3.4, the load function always check the context everytime. pickle only receive one parms. And the context's parm should be saved by EncryptionParameters.

DreamingRaven commented 4 years ago

Ah damn, I see after reading #19 whats going on. Hmm not sure what I can do to fix it, with my limited understanding of pybind. I may need to delve a bit more into it to see if there is something that I can do to contribute to solve this as it is preferable to be able to copy it even if its a hack for now until a better solution is conceived.

DreamingRaven commented 4 years ago

Is it possible to reconstruct the context from the parameters? as https://github.com/Huelse/SEAL-Python/blob/fb15ad3192be0cfb5a9e2be952852c7d1d5981c4/tests/0_data_type.py#L20-L33 looks like you are saving them. However I dont see any implementation for saving the keys, may have to test to see if it is possible, as without a good way to save the keys its difficult to do it between machines and systems.

When you say pickle can only accept one argument is it not possible to pickle multiple things at once? https://stackoverflow.com/a/20725705/11164973 either by creating a meta object like a dictionary or pickling multiple times even if that be in the same file.

Also thanks for replying quickly as always, even when you are busy ;)

Huelse commented 4 years ago

Each key, ciphertext, plaintext has it's load and save native implementation, you can find it in wrapper.cpp and thank for your advice, I will try it later. :)

DreamingRaven commented 4 years ago

Hey Huelse, I was just taking a look through to see if I could find something I could write to fix this issue. In the pybind11 documentation https://pybind11.readthedocs.io/en/stable/advanced/classes.html#pickling-support they show pickling using tuples to contain the class and any extras, in their case py::make_tuple(p.values and p.extra) in the __getstate and then return it to normal in __setstate by accessing the different parts of the desired object p and reconstructing the original. Will this be possible in the current pybind11 wrapper.cpp? Or is there a different quirk that I should be aware of before I attempt something?

It also states in the ms seal repo that the save and load function exists for every meaninfully serialisable object in seal. So in theorey our __setstate and __getstate methods could just call those underneath. https://github.com/microsoft/SEAL/blob/f7d748c97ed841376c4a1cdec9e7c978f5e64a95/native/examples/6_serialization.cpp#L146-L149

I dunno does this seem viable to you? If it does I will see about submitting a patch although again my pybind know-how is limited so forgive me if it is a bit rough at first.

Huelse commented 4 years ago

Thank you at first. Actually, I already tried it. The problem is the load(context, stream) function needs two args, we thought that save like pickle.dump((parms, obj), file_obj) is fine, but the load function in c++ only can receive one arg, parms or obj, only one of them. It bothered me for a long time. or I can choose to give up the safety check. maybe there is a way I don't know :( In SEAL3.3.2, the pickle is work well.

Huelse commented 4 years ago

I think I could use the python's api in c++, read the parms in getstate.

DreamingRaven commented 4 years ago

Yeah you are correct, https://github.com/microsoft/SEAL/blob/f7d748c97ed841376c4a1cdec9e7c978f5e64a95/native/examples/6_serialization.cpp#L316 however I thought if you pass in a higher level object like a tuple and tell it how to handle it in the wrapper.cpp that it would be possible to unpack and repack the object if you tell it how to access the tuple in the __setstate and __getstate which they do seem to show on their documentation on pybind pickling support.

if not they do have a second non stream invocation they show in their serialise.cpp file: https://github.com/microsoft/SEAL/blob/f7d748c97ed841376c4a1cdec9e7c978f5e64a95/native/examples/6_serialization.cpp#L137 using only byte_object.size and byte_object.data, although it is less efficient admitedley, but is just one single object and the two parts to save and load are properties of that one single object.

Good to know that it works in 3.3.2 so if I really need to I can just use the older version, if I cant create a patch/ pull request for this.

DreamingRaven commented 4 years ago

I think I could use the python's api in c++, read the parms in getstate.

Oh is it the case that you cant access the params in the wrapper at all, as in you cant pass them in even as a tuple? Maybe I misunderstood what you meant from your earlier answers.

Huelse commented 4 years ago

Yes, it is. only one of them. here is failed example:

.def(py::pickle(

    [](const py::tuple &t){ // const Plaintext &plain

        // t.size() = 2

        auto params = t[0].cast<EncryptionParameters>();

        auto plain = t[1].cast<Plaintext>();

        std::stringstream out1(std::ios::binary | std::ios::out);

        params.save(out1);

        std::string str1 = out1.str();

        std::string encoded1 = base64_encode(reinterpret_cast<const unsigned char *>(str1.c_str()), (unsigned int)str1.length());

        std::stringstream out2(std::ios::binary | std::ios::out);

        plain.save(out2);

        std::string str2 = out2.str();

        std::string encoded2 = base64_encode(reinterpret_cast<const unsigned char *>(str2.c_str()), (unsigned int)str2.length());

        return py::make_tuple(encoded1, encoded2);

    },

    [](py::tuple t){

        // t.size() = 2

        EncryptionParameters params = EncryptionParameters();

        std::string str1 = t[0].cast<std::string>();

        std::string decoded1 = base64_decode(str1);

        std::stringstream input1(std::ios::binary | std::ios::in);

        input1.str(decoded1);

        params.load(input1);

        auto context = SEALContext::Create(params);

        Plaintext plain = Plaintext();

        std::string str2 = t[1].cast<std::string>();

        std::string decoded2 = base64_decode(str2);

        std::stringstream input2(std::ios::binary | std::ios::in);

        input2.str(decoded2);

        plain.load(context, input2);

        return plain;

    }

))

and run

parms = EncryptionParameters(scheme_type.BFV)
poly_modulus_degree = 4096
parms.set_poly_modulus_degree(poly_modulus_degree)
parms.set_coeff_modulus(CoeffModulus.BFVDefault(poly_modulus_degree))
parms.set_plain_modulus(256)

plain = Plaintext("6")
with open('plain', 'wb') as f:
    pickle.dump((parms, plain), f)

TypeError: __getstate__(): incompatible function arguments. The following argument types are supported:
    1. (self: tuple) -> tuple  # this should be (self: Plaintext, arg0: tuple)
DreamingRaven commented 4 years ago

Oh I see, I will play about and see if I can fiddle it. Also on line

        auto params = t[0].cast<EncryptionParameters>();

should that 0 not be a 1 instead as the encryption parameters are in the second position of the tuple. pickle.dump((plain, parms), f)

I will take a closer look through and see if I can make it work, if not maybe going the byte way would be better to a fixed size as opposed to a stream to see if it works: https://github.com/microsoft/SEAL/blob/f7d748c97ed841376c4a1cdec9e7c978f5e64a95/native/examples/6_serialization.cpp#L125-L126

vector<SEAL_BYTE> byte_buffer(static_cast<size_t>(parms.save_size()));
parms.save(reinterpret_cast<SEAL_BYTE *>(byte_buffer.data()), byte_buffer.size());

https://github.com/microsoft/SEAL/blob/f7d748c97ed841376c4a1cdec9e7c978f5e64a95/native/examples/6_serialization.cpp#L136-L137

EncryptionParameters parms2;
parms2.load(reinterpret_cast<const SEAL_BYTE *>(byte_buffer.data()), byte_buffer.size());

confusingly their tests say the invocation is as follows which could be the cause: https://github.com/pybind/pybind11/blob/a54eab92d265337996b8e4b4149d9176c2d428a6/tests/test_pickling.cpp#L112

 .def(py::pickle(
            [](py::object self) {

as opposed to:

.def(py::pickle(

    [](const py::tuple &t){ // const Plaintext &plain
DreamingRaven commented 4 years ago

Just looking through they have a tonne of possible ways to fascilitate pickle, either directly or by creating a __getstate and __setstate function https://github.com/pybind/pybind11/blob/a54eab92d265337996b8e4b4149d9176c2d428a6/tests/test_pickling.cpp#L90-L110 . I started outlining one, although now I need to figure out what in particular we need to save and load:

    py::class_<EncryptionParameters>(m, "EncryptionParameters")
        .def(py::init<scheme_type>())
.
.
.
        .def("__getstate__", [](py::object self){
            return py::make_tuple(
                // ... whatever goes here probably seal self.save ...
            );
        })
        .def("__setstate__", [](py::object self, py::tuple t){
            return ;// ... whatever goes here probably seal self.load from tuple saved
        })

I think I see what you mean that it requires the seal context on load, but we are only given self in getstate so need a way to access the context from here to save it with the params.

Huelse commented 4 years ago

Yes, the single serialization of EncryptionParameters works well. There are two ways, one is to use unsafe_load, give up the valid check, ignore the params. another is pickle two obj in one file, read in c++ with python-c++ interface. Both not prefect.

DreamingRaven commented 4 years ago

Ok so I think im on to something but im unsure of how I can convert a std::vector<seal.SEAL_BYTES> object into the python equivalent of some sort maybe list[bytes] pybind does the first bit and already converts it to list[std::bytes] but I need to get that std::bytes inside that to be python bytes. Do you know how?

For reference the code in question is here and below for convenience:

        .def("__getstate__", [](const EncryptionParameters &self) {
            vector<SEAL_BYTE> byte_buffer(static_cast<size_t>(self.save_size(compr_mode_type::none)));
            self.save(reinterpret_cast<SEAL_BYTE *>(byte_buffer.data()), byte_buffer.size());
            return byte_buffer;
        })

or any version that prevents pickle complaining:

Traceback (most recent call last):
  File "/seal-python/tests/0_data_type.py", line 76, in <module>
    example_serialize()
  File "/seal-python/tests/0_data_type.py", line 30, in example_serialize
    copy.deepcopy(parms)
  File "/usr/lib/python3.8/copy.py", line 161, in deepcopy
    rv = reductor(4)
TypeError: Unable to convert function return value to a Python type! The signature was
    (self: seal.EncryptionParameters) -> List[std::byte]

then I will pass that to __getstate in whatever format we put it to in __setstate with maybe a conversion of some sort back to the original vector type:

        .def("__setstate__", [](EncryptionParameters &self, vector<SEAL_BYTE> byte_buffer) {
            new (&self) EncryptionParameters();
            self.load(reinterpret_cast<const SEAL_BYTE *>(byte_buffer.data()), byte_buffer.size());
        });

Once this is done I can copy and paste 90% appart from the class name to the types we want to be able to pickle/ deep copy which also support save and load and it should be done in theorey, deep copying and standard serialisation. .. maybe.

DreamingRaven commented 4 years ago

The only place I can find information about this is https://pybind11.readthedocs.io/en/stable/advanced/cast/strings.html#strings-bytes-and-unicode-conversions but I was wondering if you had any input.

Huelse commented 4 years ago

You need the stl_bind, there are a lot of examples in pybind11's tests - https://github.com/pybind/pybind11/blob/master/tests/test_stl_binders.cpp But I think it's unnecessary, there is a simple way to pickle the EncryptionParameters The point is the keys and ciphertext. Here is: https://github.com/Huelse/SEAL-Python/commit/70fc82b8b996076e6ea9a3511fff18ee8cf76af6

DreamingRaven commented 4 years ago

Yeah I know encryptionParameters wasn't a problem to do, but I wanted to start with an example they had in source before moving to the others/ making a generic template for serialization like it appears you did. Purely because i'm not as confident in this, so I wanted to start small.

But nice work on the commit, I will try it out with deep copy to see if it works there too for EncryptionParameters.

DreamingRaven commented 4 years ago

Just to let you know there was a build error with the new code pushed, undefined cipherstr_encoded line 39 wrapper.cpp https://github.com/Huelse/SEAL-Python/commit/70fc82b8b996076e6ea9a3511fff18ee8cf76af6#r39504816 when building docker.

Huelse commented 4 years ago

eh, sorry for that. just rename it encoded_str

DreamingRaven commented 4 years ago

Yeah already done on my end just wanted to let you know. Was clear what happened ;) Checked it through, and EncryptionParameters is also deep copyable too. Cheers, will put in a pull request if I make any progress with the rest + maybe a few unittest framework tests for better automation.

Edit unit tests implemented https://github.com/Huelse/SEAL-Python/pull/30

DreamingRaven commented 4 years ago

In SEAL3.3.2, the pickle is work well.

I just tried your 3.3.2 branch with the new unit tests so that I could work with that temporarily but it also fails to pickle as well:

~/g/seal-python (3.3.2)> sudo docker build -t archer/fhe . -f Dockerfile && sudo docker run --gpus all -it archer/fhe python3 /app/tests/unittests.py:

======================================================================
ERROR: test_deepcopy_encryptionparams_bfv (__main__.seal_tests)
Testing ability to serialise via copy.deepcopy on params object.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/app/tests/unittests.py", line 29, in test_deepcopy_encryptionparams_bfv
    parms_copy = copy.deepcopy(parms)
  File "/usr/lib/python3.7/copy.py", line 169, in deepcopy
    rv = reductor(4)
TypeError: can't pickle seal.EncryptionParameters objects

Sorry to keep bothering you with this, but how did you get pickle to work with 3.3.2?

Huelse commented 4 years ago

Try the Ciphertext, secretkey, etc, there is no binding for the Encryptionparams at the SEAL 3.3.2, or you can add it.

DreamingRaven commented 4 years ago

Hey Hue, sorry for keeping silent I was working on a workaround using the current version.

It is extremely hacky as I effectively write to temp files and read them back in, to allow serialization using pickling and deep copying. I create a meta object with all the seal objects tied together, and generating any objects that are required on the fly. I also use a caching system to save the seal objects that have no save and load methods, so they can still be used quickly without having to regenerate them every time unless the program is restarted. Here is my current implementation although it is still very rough and in the works, I have not worked out all the python setup.py etc stuff but the core library is there. https://github.com/DreamingRaven/python-reseal/blob/master/fhe/reseal.py

However it is a solution that allows people to work properly with seal in python, with a bit of magic. I even handle some of the complexity of permutations of (addition, multiplication)(ciphertext, plaintext).

One thing I did find however, is that MS-Seal has no bootstrapping and they have no intention of implementing it for anything other than CKKS + deep learning.

Anyway thanks for all the help in the past, and don't think poorly of me for the hacky temporary solution!