dmlc / minpy

NumPy interface with mixed backend execution
https://minpy.readthedocs.io/en/latest/
Other
1.11k stars 112 forks source link

Can I create a variable shared by forward and back propagation in customop('numpy')? #173

Open rguo12 opened 7 years ago

rguo12 commented 7 years ago

I do not quite understand the mechanism behind @customop('numpy'). I find that there's an intermediate variable 'Q' which is expensive to compute and appears in computing both the output and gradient. By the way, I also wonder if I can create gradient function for multiple parameters (e.g. w1 and w2 as in the code below). e.g.

@customop('numpy')
def my_operator(X,w1,w2):
    Q = f(X,w1,w2)
    H = g1(Q)
    return H
def my_operator_grad1(ans,X,w1,w2):
    def grad1(g):
        Q = f(X,w1,w2)
        R = g2(Q)
        return R
    return grad1
def my_operator_grad2(ans,X,w1,w2):
        def grad2(g):
        Q = f(X,w1,w2)
        R = g3(Q)
        return R
    return grad2
my_operator.def_grad(my_operator_grad1,argnum=1)
my_operator.def_grad(my_operator_grad2,argnum=2)

Thanks!

Taco-W commented 7 years ago

@swanderingf One of the primary reasons for the customop wrapper is some operations are not defined in GPU.

Say, I have a function using some operations defined only in CPU. Without the customop hint, the input data could be stored in GPU before the invocation. However, in execution, the intermediate data is copied from GPU to CPU in order to run the cpu-defined op, which hurts the performance and invalidate the action of converting the input data in GPU.

The customop enables the user to tell the system where to save the input data and the some of the slow data copies between different devices can be avoided.

Sharing the mutual computation is supported in minpy by def_multiple_grad. You can re-write the code by:

@customop('numpy')
def my_operator(X, w1, w2):
    Q = f(X,w1,w2)
    H = g1(Q)
    return H
def my_operator_grads(ans, X, w1, w2):
    def grad(g):
        Q = f(X,w1,w2)
        return (g2(Q), g3(Q))
    return grad
my_operator.def_multiple_grad(my_operator_grads, (0, 1))