dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
8.99k stars 1.88k forks source link

Support encrypted models #2910

Open gilnahmias opened 5 years ago

gilnahmias commented 5 years ago

We want the ability to publish trained models that people can use to predict on their own devices, but not to reverse engineer the algorithms used to train them. That, in order to protect IP while allowing sensitive data to be predicted on customer machines. Can you support that?

TomFinley commented 5 years ago

Hi @gilnahmias, no. The closest thing that we have worked on in the past is using homomorphic encryption, but that is not currently part of the ML.NET "story," nor is it likely to be in the immediate term.

Nonetheless, I do find your question quite interesting and I would like to understand more. I have some slight interest in this topic. I'd like to emphasize however that I am not an expert.

I am having trouble with "predict on their own devices." My experience in this area is limited to what I read out of Kristin Lauter's group, the SEAL library, CryptoNets, and suchlike. So, this may be an incomplete picture, but the experience there, AFAIK, is that the user sends encrpyted data to the provider of the model, they (the model provider) evaluate the model themselves, then send back the encrypted prediction, which the user is then free to decrypt. (Crucially, the person that wrote the model does not share the model with the user, but at the same time has absolutely no clue what the prediction is, since all operations happen over terms that involve the user's key, which only the user knows.)

You seem with your "predict on their own devices" be suggesting an inversion of that relationship, and it's unclear to me how that could work, just from a theoretic perspective, at least in any sort of "pure" sense. If you have an "encrypted" operand, and perform an operation on it using any homomorphic encryption, the result is effectively "encrypted" using the same key, which ex hypothesi the user does not know. So, the user could not decrypt it. (I could imagine a model that was encrpyted on the user's device and used the user's device to do the encryption, and the final step would be for an "off-site" decryption of the result, but even this would be deeply problematic and dangerous, and not fulfill your requirements.)

That said, I would like to emphasize I am not an expert in this area. My last knowledge of the state of the art is about four years out of date, and I have not kept track of recent developments in the field, or even in the field of untrusted multi-party computation even in the most general sense. So if you have more information you can share I'd happily absorb it.

gilnahmias commented 5 years ago

Thanks @TomFinley, that's actually a really fascinating topic!

First off, you are right to capture my intent as both the operator and the operand reside with the Bob, even though the operand is formerly encrypted by Alice. So Bob can invoke the operand but not reverse engineer it.

I guess I'm looking for "PELock for ML models". In the engineering world, there's no one-way encryption. Everything could be reverse engineered so the common practice is to make reverse engineering as annoying as possible. Moving the discussion to ML world, we have both the model and the runtime; They might have different answers to that question.

I don't know what a solution might look like but options I can't rule out could be:

I'm just thinking out loud.

Ultimately, the problem space is still valid. How does one protect their IP after deploying models (for latency reasons, say).