OpenMined / PySyft

Perform data science on data that remains in someone else's server
https://www.openmined.org/
Apache License 2.0
9.43k stars 1.99k forks source link

Add Generic Tensor as Basic Type #10

Closed iamtrask closed 6 years ago

iamtrask commented 7 years ago

Description: In the Proof-of-Concept Syft implementation, the basic type was a Float object contained within the Paillier.py class. This class contains overridden functions for various mathematical operators, so that the class can then be used inside of numpy arrays, as in our basic linear network linear.py. However, this approach, while simple, has several issues:

In this work, we want to remedy these issues by building a generic tensor type, allowing us to handle encrypted values in the optimal way under the hood without requiring advanced knowledge by the user. The API for this tensor type will take direct inspiration from the unencrypted tensor classes done in the PyTorch framework, as it is a long term desire to eventually integrate with PyTorch if it becomes feasible to do so.

Acceptance Criteria:

  1. The generic tensor type must support the following operations inline with PyTorch: a. abs(), abs(), add(value), add(value), addbmm(), addbmm(), addcdiv(), addcdiv(), addcmul(), addcmul(), addmm(), admm(), addmv(), addmv(), addr(), addr(), baddbmm(), baddbmm(), bernoulli(), bernoulli(), bmm(), cauchy(), ceil(), ceil(), char(), chunk(), clamp(), clamp(), clone(), contiguous(), copy(), cpu(), cuda() - returns "not yet supported", cross(), cumprod(), cumsum(), diag(), dim(), dist(), div(), div(), dot(), double(), eq(), eq(), equal(), exp(), exp_(), expand(), expandas(), exponential(), fill(), float(), floor(), floor(), fmod(), fmod(), frac(), frac(), gather(), ge(), ge(), geometric(), gt(), gt(), half(), histc(), index(), indexadd(), indexcopy(), indexfill(), index_select(), int(), inverse(), is_contiguous(), is_cuda(), issigned(), le(), le(), lerp(), lerp(), log(), log1p(), log1p(), log_(), lognormal(), long(), lt(), lt_(), maskedscatter(), maskedfill(), maskedselect(), matmul(), max(), mean(), median(), min(), mm(), mode(), mul(), mul(), multinomial(), mv(), narrow(), ndimension(), ne(), ne(), neg(), neg(), nelement(), new(), nonzero(), norm(), normal(), numel(), numpy(), permute(), pow(), pow(), prod(), random(), reciprocal(), reciprocal(), remainder(), remainder(), renorm(), renodm(), repeat(), resize_(), resizeas(), round(), round(), rsqrt(), rsqrt(), scatter(), select(), set(), short(), sigmoid(), sigmoid(), sign(), sign(), size(), split(), sqrt(), sqrt(), squeeze(), squeeze(), stride(), sub(), sub(), sum(), t(), t(), tolist(), topk(), trace(), transpose(), transpose(), type(), unfold(), uniform(), unsqueeze(), unsqueeze_(), view(), viewas(), zero() b. For all functions above that pull from a distribution, optionally use a lookup table (abstract this?) c. For all functions above that use a function, optionally use an interpolation (abstract this?)

  2. The generic tensor type must be able to support an arbitrary number of dimensions for all operations.

  3. The generic tensor type must overload +-*/ and apply the appropriate operator (based on the dimensions of the tensors involved.

  4. The generic tensor type must prefer to use vector storage (under the hood) when encrypted to maximize the usefulness of packing in homomorphic encryption

  5. Any operations that are unsupported in the encrypted domain should include error messages

  6. str and repr should follow the conventions set in PyTorch.

  7. Additionally, there should be "encrypt(publickey)" and "decrypt(prikey)" functions which accept any key of the abstract type public or private key.

  8. All functionality within this base class should be of abstract type regardless of encryption type (specialized functionality for any encryption scheme should live within that scheme).

  9. Before this ticket is closed, this functionality should be shown to work with at least one homomorphic encryption strategy

Ticket Creator: @iamtrask

iamtrask commented 7 years ago

@samsontmr this one could really be split into subtasks. Once the Tensor class is implemented with the first basic operations (2-9) which are Intermediate level, the zillion functions in (1) are all "Beginner" tasks which would be great entry-level starts for folks, you know?