Closed iamtrask closed 6 years ago
Split this into sub-tasks for each type?
Maybe, however the best way to build it is probably to build the "Arbitrary" one first and then just instantiate the rest of them using specific configs of the arbitrary?
How do you guys generally see this implemented? I was imagining each encrypted tensor have its own self.precision
. Then when a new Tensor is created from an array then numbers in the array are multiplied by 10**self.precision
and divided by 10**self.precision
after decryption. For example, the encrypt
method in pailier/keys.py would get a new line like x *= 10 ** precision
:
def encrypt(self, x, same_type=False, precision=4):
"""Encrypts x. X can be either an encrypted int or a numpy
vector/matrix/tensor."""
x *= 10 ** precision
...
return self.pk.encrypt(x)
Then all math operation for PailierTensor
should be modified to take precision into account. Does this sound right? or is there a better way to do this?
Man, great question. I've spent a couple hours yesterday working through a simple version of this just for the Multi-party-computation prototype and I learned a few things about how we might like this to work. First, I found it's important to separate two pieces of configuration.
1) Tensor Specific Config: each tensor (encrypted or not) will have its own configurable level of precision which corresponds to its actual memory footprint for each number. However, the one thing these all seem to have in common is that the underlying can only be used to represent Natural Numbers (https://en.wikipedia.org/wiki/Natural_number). Thus, this Tensor specific precision is really about answering the questions "what range of natural numbers can this tensor represent given this configuration?" Also, in the case of encryption, this configuration has a lot to do with both the speed and security of the encrypted operations.
2) Fixed Precision Abstraction: A separate issue is the upper and lower bound as well as level of precision of the Tensor that the end user wants to use. In other words, most users won't care as much about the underlying range of (1)... what they care about is the end result (Float, Double, Int, etc.) This tensor abstraction should allow one to not know anything about (1) and generally use the same interfaces for rebalancing (like you mentioned above with the multiplication).
Two useful resources:
The python-paillier has implemented its own fixed precision logic (which we're currently using for PaillierTensor but we want to get away from to something more abstract) and the blogpost has a simple implementation as well (which we're currently using for our Multi-Party Computation prototype but we also want to get away from to something more abstract).
I think I get your points, but I'm still a bit confused.
Also what is the purpose of Float
class in he/paillier/basic.py? Is it related to this issue?
PaillierFloatTensor - would be a tensor with Float level precision configured to be optimal for Paillier encryption BVLongTensor - would be a Long level precision configured to be optimal for BV encryption.
while custom would be optional... i think most folks would simply grab one of these guys.
Thoughts?
Also what is the purpose of Float class in he/paillier/basic.py? Is it related to this issue?
That Float class would likely become deprecated in leau of an "Integer" class which would use our abstract fixed point precision stuff (generic to all tensors). We are only using that Float there because python-paillier has fixed precision baked in (which is not true for our other libs)
“while custom would be optional... i think most folks would simply grab one of these guys. Thoughts?”
Oh I don't think custom types are necessary either. PaillierFloatTensor
and BVLongTensor
... sounds good. It's similar to PyTorch so even better!
Now, what I undestand is that PaillierFloatTensor
and PaillierLongTensor
will be wrappers around PaillierTensor
(which is the integer class with our “abstract fixed point precision stuff”). For example, PaillierLongTensor
will inherit from a PaillierTensor
, but override precision configuration and methods as necessary to behave like a proper LongTensor, correct?
Couldn't have said it better myself!!!
Ok I'm gonna start working on this. (hands his beer to a bystander and starts stretching)
(If anyone is interested, please feel free to work on this issue. I'm not sure when (or if) I'll be able to offer a good solution. )
Hands @siarez 2 beers and a shot of Jack Daniels. You got this!!!
After some deep introspection, I realized my FixedPoint
class (in my pull request) has issues. I found a fixed point module that seems pretty solid: SPFPM. It yields itself to what we are trying to do very well.
However, it does not seem PIP installable, eventhough it is listed on PyPi
@iamtrask what is your position on using this module?
For now, I decided not to use SPFPM. So you can ignore my previous comment.
Here are a few thoughts that I have been grappling with while working on this issue:
Why should we support non-python data types (e.g. Byte, Half, Double, Short ...) when the implementation of our encryption (e.g. phe
) is in Python and uses Python numeric datatypes? It probably makes sense for PyTorch, because their arithmetic is implemented in C, but that is not the case for us. In other words, limiting the datatype by number of bits in Python seems like an unnecessary constraint to impose on ourselves.
Also as I understand it, phe
has floating point precision built-in not fixed-point. So I don't quite understand why we need to implement an abstract fixed-point datatype when our end goal is to support floating-point datatypes? Why not implement an abstract floating-point type from the get go?
Also (this could be due to my ignorance, but) I don't understand how we can support floating-point operations with a fixed-point datatype.
@iamtrask Thoughts?
ndarray
)Is it possible to encrypt the exponent, if we are using fixed-point? It's not obvious to me how. But I did a bit of number crunching and found out:
2*2^53*2^10 == 2^64 == 18446744073709551616
elements ~1.79769e+308
2^1022
elements2^1021 == 10^307
(I suspect that would not be very different for other encryptions either.)[-1e292 +1e292]
and precision of 1/2^30 == 9.3e-10
Until now, I had not noticed the fact that we have 1022 bits available for our fixed-point! I think that means it is possible to have a fixed-point number that has reasonable range and precision for deep learning tasks. Therefore, I think a fixed-point number should be good enough for now.
@iamtrask If my points in (2) make sense to you, I can polish up my PR and so it can be merged. Also what do you think about (1)?
Totally agree but I think the best use case of PyTorch integration is the other way around. We want PyTorch's autograd and neural networks implementations to use our tensors, so the high level interfaces should mirror them as best we can.
I think they do. That being said, it seems like there are tradeoffs to both. For floating point, in theory you could intentionally use a smaller range (which would make it faster, yes?) and let the decimal point movements do the heavy lifting. So long run it might be interesting to still offer both, but at this point i'm jsut playing devil's advocate.
I'll update my code with the recent insights.
Description: All of our Homomorphic Encryption algorithms operate on Integers under the hood. However, we want to be able to perform math on numbers of varying precision (most popularly, Floats). Thus, we need someone to put together a generic Fixed Point Precision wrapper that can provide this functionality around encryption algorithms that are Integers such that the following types are supported:
Byte (8-bit float)
Half (16-bit float)
Float (32-bit float)
Double (64-bit float)
Arbitrary (n-bit float)
Char (8-bit int)
Short (16-bit int)
Int (32-bit int)
Long (64-bit int)
Arbitrary (n-bit int)
Acceptance Criteria: