NervanaSystems / neon

Intel® Nervana™ reference deep learning framework committed to best performance on all hardware
http://neon.nervanasys.com/docs/latest
Apache License 2.0
3.87k stars 811 forks source link

Missing some layers #381

Open guoxuesong opened 7 years ago

guoxuesong commented 7 years ago

I'm porting my model from lasagne/theano .

I found there are some layers I need not exist in neon.

The first question I want to ask is " Is there any tutorials about creating custom layers ?"

I looked into the history issues, Create custom layer #100 was closed, but I think there still should be some kind of tutorials, at least to tell what kind of requests is doable and what not.

Some layers looks simple but missed, for example something like SliceLayer to slice the input at a special axis, something was said simple in history issues like DimshuffleLayer to transpose input.

I think if these are really simple and NervanaSystems do not want to support them officially, maybe you can teach us howto do it by ourselves.

But, the real challenge for me is to implement Goroshin's argmax, as referenced in STACKED WHAT-WHERE AUTO-ENCODERS, I implemented it using Theano just like this:

def goroshin_argmax(z,shape,axis=(1,),beta=3,epsilon=0.0001):
    z=z/(abs(T.max(z))+floatXconst(epsilon))
    a=()
    for t in axis:
        a+=(slice(0,shape[t]),)
    xyshape=list(shape)+[]
    for i in range(len(shape)):
        if i not in axis:
            xyshape[i]=1
    xy=T.mgrid[a]
    b=T.exp(beta*z)/T.exp(beta*z).sum(axis,keepdims=True)
    res=[]
    for i in range(len(axis)):
        x=((xy[i].astype(floatX)).reshape(xyshape)*b).sum(axis=axis)
        res+=[x]
    return T.stack(res,axis=1)

It seems that neon has something named Autodiff, can I use it to calculate the gradient or should I do some math work ?

I know you have another thing named ngraph , can I use ngraph to write custom layers for neon ? I don't want use ngraph to do the whole job.

guoxuesong commented 7 years ago

I just make my project public: deepstacks

deepstacks is: A build_network() for Lasagne and noen. Define your network model in a datacheet with stack machine mechanisms. Support reuse part of model as function, and share parameters.

Please have a look at deepstacks/deepstacks/neon/implement.py

To complete the implement for neon, I need ( in the words of Lasagne): ElemwiseMergeLayer,SliceLayer,Upscale[123]DLayer,LocallyConnected[123]DLayer,DimshuffleLayer,GaussianNoiseLayer,ExpressionLayer. Leave them not implemented is ok, but I want to complete it if posible.

I wish my project can help more peaple to take advantage of neon. Though, myself is new to neon, so if any part of may code is wrong, just let me known.

chengchingwen commented 7 years ago

there is a neon tutorial maybe you can take a look at this one

guoxuesong commented 7 years ago

@chengchingwen would you please explain these lines for me ? in bprop:

    if self.deltas:
        self.be.compound_dot(A=self.W.T, B=error, C=self.deltas, alpha=alpha, beta=beta)
    self.be.compound_dot(A=error, B=self.inputs.T, C=self.dW)
guoxuesong commented 7 years ago

I tried to implement a GaussianNoiseLayer, following is my code. I'm not sure whether this is correctly, I does not really understand the alpha, beta things, just copied them from SkipNode:


class GaussianNoiseLayer(Layer):
    def __init__(self, sigma=0.1, name=None):
        super(GaussianNoiseLayer, self).__init__(name)
        self.sigma = sigma
        self.owns_delta = True
        self.is_mklop = True

    def fprop(self, inputs=None, inference=False, beta=0):
        self.be.fill_normal(self.noisebuf, stdv=self.sigma)
        self.be.fprop_skipnode(inputs, self.outputs, beta)
        self.outputs[:] = self.outputs + self.noisebuf
        return self.outputs

    def configure(self, in_obj):
        super(GaussianNoiseLayer, self).configure(in_obj)
        self.out_shape = self.in_shape

        self.noisebuf = self.be.iobuf(self.in_shape, dtype=np.float32)
        # self.noisebuf = self.be.iobuf(self.in_shape)
        return self

    def bprop(self, error, alpha=1.0, beta=0.0):
        # for better performance, mkl do nothing
        # otherwise, convert back and deal with beta and alpha.
        self.be.bprop_skipnode(error, self.deltas, alpha, beta)
        return self.deltas
chengchingwen commented 7 years ago

@guoxuesong in bprop, self.deltas is the error need to be back prop to the previous layer, alpha & beta is just parameters of self.be.compound_dot, you may want to take a look at the doc. I guess it just say take the dot product of self.W.T & error and assign to self.deltas if self.deltas is not set, and compute the dW every time the bprop is called.