facebookresearch / CrypTen

A framework for Privacy Preserving Machine Learning
MIT License
1.52k stars 278 forks source link

Supporting plaintext model, encrypted data use-case. #469

Closed kwmaeng91 closed 1 year ago

kwmaeng91 commented 1 year ago

Dear experts:

I want to know if Crypten is going to support plaintext model + encrypted data in the future, or if it is already possible.

I think this should be fundamentally possible, because e.g., a[x] + a[y] = a*[x+y] holds for plaintext a. From searching past issues, it seems like the discussion from #415 suggests that this was possible at some point (with a caveat that when doing this for training the gradients will leak info of the input). However, I get RuntimeError: Cannot input CrypTensors into unencrypted model. Encrypt the model before feeding it CrypTensors. when I try. The error message sounds very explicit, so not sure if it is possible at all with the current codebase.

Plaintext model + encrypted data use-case can be useful when a user (without enough compute power to participate as a party in MPC) generates a secret share of the input, and sends it to P non-colluding parties to run inference. A plausible privacy definition may assume that the P parties should not learn about the user input but can share the same model weights (e.g., they might be running a open-source model like GPT-3, or they might have some binding contract). The inference will be much faster if we do not hide the model.

I wonder (1) if plaintext model + encrypted data will be supported in the future, or (2) if there is a workaround/hack to do this. Please let me know if I am mistaken in any ways. Thank you for the help and the wonderful work!

lvdmaaten commented 1 year ago

I don't exactly remember why we made this choice but it was, indeed, a very deliberate decision.

My guess is we made this choice because we struggled to get code patterns like torch_tensor.function(cryptensor) working because the torch.function call needs to somehow be converted in a crypten.function call. This works for simply code patterns via implementation of functions like __rmul__ but in the more general cases, it's complicated. The support for this kind of thing in PyTorch has improved in recent years, but much of that functionality did not exist when CrypTen was developed.

You could comment out line 526 to line 531 here and see what happens.

kwmaeng91 commented 1 year ago

Thanks. I will play around with it. It seems like you are suggesting that except for some functions, applying some hack should make it work in other cases?

kwmaeng91 commented 1 year ago

I found that inference works well (so far) for unencrypted models, but training does not. There are a few places that needs to be fixed if you want to train a plaintext model, e.g., here. However, it kind of makes sense to not support encrypted data + plaintext model anyways, so I will close this thread.