Closed alejandrosame closed 4 years ago
@alejandrosame are you working on this already?
Hi @fdroessler !
Yes, I already started working on it. Hopefully, I'll soon send a PR with the first couple of hooked functions (I'm giving priority to the ones used on tutorials right now).
As somewhat expected, I was a bit overconfident with my initial approach to the task, so I'm leaving here an explanation of the main problem to solve in order to integrate NumPy as a supported framework inside Syft.
From the user point of view, we expect the hooking process to look like the PyTorch hook:
import numpy as np
import syft as sy
hook = sy.NumpyHook(np)
The codebase already provides with an abstract class FrameworkHook. However, if we try to create directly a NumpyHook(FrameworkHook) following the approach of TorchHook, we find that the previous code returns a TypeError: can't set attributes of built-in/extension type 'numpy.ndarray'
when trying to extend numpy.ndarray with something like self._hook_native_tensor(numpy.ndarray, NumpyTensor)
.
In this case, the first hurdle to solve is how to get around the fact that NumPy is not implemented in pure Python but Cython. Trying to monkey patch NumPy directly is too unstable, the implementation started to fail without properly raising errors and required duplication of code to deal with builtin classes.
Therefore, I'm now trying to provide from inside Syft a module that wraps around the necessary NumPy classes, which would behave like a pure Python module for the hooking process. This approach has the advantages of being more robust and able to efficiently reuse the scaffolding code provided by Syft, once properly implemented. In this case, the user would need to hook NumPy like:
import syft.<path_to_be_determined>.numpy as np
import syft as sy
hook = sy.NumpyHook(np)
If the user tries to hook the numpy directly, Syft should return an error to redirect the user to use the wrapped NumPy (e.g. ValueError: Please hook NumPy by importing syft.<path_to_be_determined>.numpy
with example code).
I'm still going slow with this task while I'm testing and learning the overall codebase for framework hooks so I'll soon push a WIP PR so anyone can also check the approach and provide feedback!
It seems that NumPy has this use case covered and I overlooked the documentation. In the subclassing doc it is very well documented how to override the behaviour of ndarray and the defined subclass would allow to extend attributes dynamically.
As explained in the previous comment, trying to extend numpy.ndarray directly will fail:
> import numpy
> numpy.ndarray.n = "nice"
TypeError: can't set attributes of built-in/extension type 'numpy.ndarray'
However, subclassing numpy.ndarray will allow extending attributes dynamically. The follwing code doesn't give the TypeError:
import numpy
class B(numpy.ndarray):
pass
B.n = "nice"
Once the proper functions are overriden, the user should be able to use NumPy directly as the input of NumpyHook instead of the previous proposal of providing a wrapped NumPy library. So, the usage will be identical with the only difference that after hooking numpy, the arrays created by the library would be of type B instead of ndarray.
Definitely use Numpy’s support for subclassing. Eventually the PyTorch side will do this too (changing FrameworkHook considerably) as PyTorch very recently received support for subclassing
Sent from my iPhone
On 30 Dec 2019, at 08:22, Alejandro Sánchez Medina notifications@github.com wrote:
Reopened #2771.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.
Does this now allow the Federated Learning with Encrypted Gradient Aggregation tutorial to work? Or is it that we need to add some additional piece of code along with the already existing modules? Sorry, I was not able to follow the above discussion clearly.
I don't think the work was completed - I'm currently working on it as a part of Syft 0.4 but that won't be out for a bit.
On Fri, Mar 6, 2020 at 7:02 AM Samveed notifications@github.com wrote:
Does this now allow the Federated Learning with Encrypted Gradient Aggregation tutorial to work? Or is it that we need to add some additional piece of code along with the already existing modules? Sorry, I was not able to follow the above discussion clearly.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/OpenMined/PySyft/issues/2771?email_source=notifications&email_token=ABBAZEVAK5UP2JL3S5A4D33RGCNYRA5CNFSM4JVEK7JKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEOAJQOQ#issuecomment-595630138, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBAZEX4LRAF32C2SRRD6JDRGCNYRANCNFSM4JVEK7JA .
Hi!
Indeed, this task is not completed yet. The main block I found out has been the use of NumPy in the codebase itself, which has lead me to weird bugs while trying to subclass ndarray.
I'll keep on studying the code and the proper use of ndarray subclassing for Syft, but I am currently on a break from this task.
This issue has been marked stale because it has been open 30 days with no activity. Leave a comment or remove the stale
label to unmark it. Otherwise, this will be closed in 7 days.
Has this been fixed now?
Hi!
Supporting NumPy as a framework would allow it to be used for Machine Learning using Federated Learning.
As reported by @iamtrask, NumPy is used normally used in ML toolchains, so PySyft users will greatly benefit from having it as a supported framework.