denizyuret / Knet.jl

Koç University deep learning framework.
https://denizyuret.github.io/Knet.jl/latest
Other
1.43k stars 230 forks source link

Feature request: LayerNormLSTM kernel #492

Open ngphuoc opened 5 years ago

ngphuoc commented 5 years ago

I implemented a naive layer normalization with "manual" LSTM and it run extremely slow, 10 times slower than using the cudnn RNN interface (no layer normalization feature).

Could we implement a LayerNorm kernel for RNN similar to pytorch? I do a simple search in pytorch source code for some references, hope it gives some glues:

pytorch/caffe2/python/rnn_cell.py
pytorch/aten/src/ATen/native/cpu/layer_norm_kernel.cpp
pytorch/aten/src/ATen/native/layer_norm.cpp
denizyuret commented 5 years ago

I have implemented LayerNorm like this before:

struct LayerNorm; a; b; ϵ; end

function LayerNorm(dmodel; eps=1e-6)
    a = param(dmodel; init=ones)
    b = param(dmodel; init=zeros)
    LayerNorm(a, b, eps)
end

function (l::LayerNorm)(x, o...)
    μ = mean(x,dims=1)
    σ = std(x,mean=μ,dims=1)
    l.a .* (x .- μ) ./ (σ .+ l.ϵ) .+ l.b                                                         
end

There may be a few tricks to make it faster: e.g. not doing (x .- μ) twice etc. But ultimately we need a GPU kernel.

I do not understand why you have to do manual LSTM and cannot use the cudnn interface. Is it a multilayer RNN and the LayerNorm is in between the RNN layers? In that case a GPU kernel is not going to help, we need to wait for cudnn to catch up or you need to separate your layers and stick LayerNorms between them.

ngphuoc commented 5 years ago

Yes, I have a similar LayerNorm struct to yours. For LSTM we need to apply it to the LSTMCell internal as follow (copied from the Layer Normalization paper):

2019-10-08-120004_662x389_scrot

denizyuret commented 5 years ago

Until CuDNN supports this in its API, one workaround I can think of is to create N separate RNNs for N layers and insert LayerNorm layers between them manually. One could write a struct that hides most of the dirty details from the user.

On Tue, Oct 8, 2019 at 4:00 AM ngphuoc notifications@github.com wrote:

Yes, I have a similar LayerNorm struct to yours. For LSTM we need to apply it to the LSTMCell internal as follow (copied from the Layer Normalization paper):

[image: 2019-10-08-120004_662x389_scrot] https://user-images.githubusercontent.com/23738797/66359526-3a72b000-e9c3-11e9-89e9-b78e679a01e6.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/denizyuret/Knet.jl/issues/492?email_source=notifications&email_token=AAN43JSAGTILRW2CX37AGQLQNPLULA5CNFSM4IZC3DG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEASJEII#issuecomment-539267617, or mute the thread https://github.com/notifications/unsubscribe-auth/AAN43JUXJVHW6GADIRZ4INTQNPLULANCNFSM4IZC3DGQ .