OpenMined / PySyft

Perform data science on data that remains in someone else's server
https://www.openmined.org/
Apache License 2.0
9.53k stars 1.99k forks source link

Hook NumPy as a new supported framework #2771

Closed alejandrosame closed 4 years ago

alejandrosame commented 4 years ago

Hi!

Supporting NumPy as a framework would allow it to be used for Machine Learning using Federated Learning.

As reported by @iamtrask, NumPy is used normally used in ML toolchains, so PySyft users will greatly benefit from having it as a supported framework.

fdroessler commented 4 years ago

@alejandrosame are you working on this already?

alejandrosame commented 4 years ago

Hi @fdroessler !

Yes, I already started working on it. Hopefully, I'll soon send a PR with the first couple of hooked functions (I'm giving priority to the ones used on tutorials right now).

alejandrosame commented 4 years ago

As somewhat expected, I was a bit overconfident with my initial approach to the task, so I'm leaving here an explanation of the main problem to solve in order to integrate NumPy as a supported framework inside Syft.

From the user point of view, we expect the hooking process to look like the PyTorch hook:

import numpy as np
import syft as sy

hook = sy.NumpyHook(np)

The codebase already provides with an abstract class FrameworkHook. However, if we try to create directly a NumpyHook(FrameworkHook) following the approach of TorchHook, we find that the previous code returns a TypeError: can't set attributes of built-in/extension type 'numpy.ndarray' when trying to extend numpy.ndarray with something like self._hook_native_tensor(numpy.ndarray, NumpyTensor).

In this case, the first hurdle to solve is how to get around the fact that NumPy is not implemented in pure Python but Cython. Trying to monkey patch NumPy directly is too unstable, the implementation started to fail without properly raising errors and required duplication of code to deal with builtin classes.

Therefore, I'm now trying to provide from inside Syft a module that wraps around the necessary NumPy classes, which would behave like a pure Python module for the hooking process. This approach has the advantages of being more robust and able to efficiently reuse the scaffolding code provided by Syft, once properly implemented. In this case, the user would need to hook NumPy like:

import syft.<path_to_be_determined>.numpy as np
import syft as sy

hook = sy.NumpyHook(np)

If the user tries to hook the numpy directly, Syft should return an error to redirect the user to use the wrapped NumPy (e.g. ValueError: Please hook NumPy by importing syft.<path_to_be_determined>.numpy with example code).

I'm still going slow with this task while I'm testing and learning the overall codebase for framework hooks so I'll soon push a WIP PR so anyone can also check the approach and provide feedback!

alejandrosame commented 4 years ago

It seems that NumPy has this use case covered and I overlooked the documentation. In the subclassing doc it is very well documented how to override the behaviour of ndarray and the defined subclass would allow to extend attributes dynamically.

As explained in the previous comment, trying to extend numpy.ndarray directly will fail:

> import numpy
> numpy.ndarray.n = "nice"

TypeError: can't set attributes of built-in/extension type 'numpy.ndarray'

However, subclassing numpy.ndarray will allow extending attributes dynamically. The follwing code doesn't give the TypeError:

import numpy

class B(numpy.ndarray):
  pass

B.n = "nice"

Once the proper functions are overriden, the user should be able to use NumPy directly as the input of NumpyHook instead of the previous proposal of providing a wrapped NumPy library. So, the usage will be identical with the only difference that after hooking numpy, the arrays created by the library would be of type B instead of ndarray.

iamtrask commented 4 years ago

Definitely use Numpy’s support for subclassing. Eventually the PyTorch side will do this too (changing FrameworkHook considerably) as PyTorch very recently received support for subclassing

Sent from my iPhone

On 30 Dec 2019, at 08:22, Alejandro Sánchez Medina notifications@github.com wrote:

 Reopened #2771.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

Samveed commented 4 years ago

Does this now allow the Federated Learning with Encrypted Gradient Aggregation tutorial to work? Or is it that we need to add some additional piece of code along with the already existing modules? Sorry, I was not able to follow the above discussion clearly.

iamtrask commented 4 years ago

I don't think the work was completed - I'm currently working on it as a part of Syft 0.4 but that won't be out for a bit.

On Fri, Mar 6, 2020 at 7:02 AM Samveed notifications@github.com wrote:

Does this now allow the Federated Learning with Encrypted Gradient Aggregation tutorial to work? Or is it that we need to add some additional piece of code along with the already existing modules? Sorry, I was not able to follow the above discussion clearly.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OpenMined/PySyft/issues/2771?email_source=notifications&email_token=ABBAZEVAK5UP2JL3S5A4D33RGCNYRA5CNFSM4JVEK7JKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOAJQOQ#issuecomment-595630138, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBAZEX4LRAF32C2SRRD6JDRGCNYRANCNFSM4JVEK7JA .

alejandrosame commented 4 years ago

Hi!

Indeed, this task is not completed yet. The main block I found out has been the use of NumPy in the codebase itself, which has lead me to weird bugs while trying to subclass ndarray.

I'll keep on studying the code and the proper use of ndarray subclassing for Syft, but I am currently on a break from this task.

github-actions[bot] commented 4 years ago

This issue has been marked stale because it has been open 30 days with no activity. Leave a comment or remove the stale label to unmark it. Otherwise, this will be closed in 7 days.

nitincic commented 4 years ago

Has this been fixed now?