layer_norm for 2D and more?

clab / dynet

DyNet: The Dynamic Neural Network Toolkit

Apache License 2.0

3.42k stars 704 forks source link

layer_norm for 2D and more? #1066

Open duyvuleo opened 6 years ago

duyvuleo commented 6 years ago

Hi all,

Is it useful to have a layer_norm_2d (the input is a matrix) in addition to layer_norm? Currently, I tried a naive version but it may be slow.

Expression layer_norm_2d(const Expression& x, const Expression& g, const Expression& b){
    std::vector<Expression> vCols(x.dim().d[1]);
    for (unsigned i = 0; i < x.dim().d[1]; i++){ 
        Expression c_x = select_cols(x, {i});
        vCols[i] = layer_norm(c_x, g, b);
    }
    return concatenate_cols(vCols);
}

Any suggestion? I would be happy to work on this!

Thanks!

duyvuleo commented 6 years ago

My bad. layer_norm can work on matrix input. Sorry for this mis-understanding!

duyvuleo commented 6 years ago

It seems that I actually confused the behaviour of layer_norm again. Current impl. of layer_norm does not perform position-wise layer normalisation, e.g., the input matrix and I want it to perform col-wise layer normalisation.

neubig commented 6 years ago

Yes, this would be nice to have but is not implemented yet. Actually layer norm is currently just implemented using a combination of basic dynet operations (e.g. it's not its own special optimized operation), so you could probably look at the implementation in expr.cc and create your own dimension-wise version. If you have trouble doing so with the operations currently implemented in DyNet we can add the ones that are necessary.

duyvuleo commented 6 years ago

Currently, I do it using the existing nodes dynet has:

dynet::Expression layer_norm_colwise(const dynet::Expression& x, const dynet::Expression& g, const dynet::Expression& b, float epsilon=1e-8){
    dynet::Expression mu = dynet::concatenate(std::vector<dynet::Expression>(x.dim()[0], dynet::transpose(dynet::mean_dim(x, {0}))));
    dynet::Expression sigma = dynet::concatenate(std::vector<dynet::Expression>(x.dim()[0], dynet::transpose(dynet::std_dim(x, {0}))));
    dynet::Expression x_centered = x - mu;
    return dynet::cmult(g, dynet::cdiv(x_centered, sigma + epsilon)) + b;
}

I don't think it is beautiful. And it seems to be slow actually.

I am looking at this native impl.: https://github.com/marian-nmt/marian-dev/blob/7432024c7de7c2b928b1654d62afb7b9834ed934/src/kernels/tensor_operators.cu.

Do you think we should have the same?