denizyuret / Knet.jl

Koç University deep learning framework.
https://denizyuret.github.io/Knet.jl/latest
Other
1.43k stars 230 forks source link

Test more cudnn functions (batchnorm, lstm, etc.) and use if faster. #177

Open denizyuret opened 7 years ago

denizyuret commented 7 years ago

On a high level this is an interface that would be nice to have for rnns:

(weights, state) = initrnn(input)
(output,state) = rnn(weights,input,state)

state can be used to encapsulate various things, such as:

input dimensionality can be:

Question:

denizyuret commented 6 years ago

To implement RNNs from CUDNN we can follow the pattern in conv.jl. (see rnn.jl for evolving code and RNN_example.cu and http://docs.nvidia.com/deeplearning/sdk/cudnn-developer-guide from NVIDIA as references).This means we don't keep around the cudnn descriptors, just create them on the fly whenever they are needed. Thus it is important to keep them lightweight. (Although we can keep them around in the state variable). Questions:

denizyuret commented 6 years ago

Current design for the primitive rnn operation:

rnn(w,x,hx,cx,s; training=false) => (y,hy,cy,rs)
# hx,cx can be nothing
# s keeps mostly read-only info like numLayers
# rs is reserveSpace for the back functions, nothing for inference
# training determines whether inference or training is called

rnn_r=recorder(rnn)
rnn(w::Rec, x, hx, cx, s)=rnn_r(w,x,hx,cx,s; training=true)
# we assume w::Rec means we are training and call the recorder version

rnn(::Type{Grad{1}},dr,r,w,x,hx,cx,s)=((y,hy,cy,rs)=r; (dy,dhy,dcy,drs)=dr; backData(); backWeights(); set s.dx, s.dhx, s.dcx; return dw)
rnn(::Type{Grad{2}},dr,r,w,x,hx,cx,s)=s.dx
rnn(::Type{Grad{3}},dr,r,w,x,hx,cx,s)=s.dhx
rnn(::Type{Grad{4}},dr,r,w,x,hx,cx,s)=s.dcx
# we always need to call backData before backWeights. Here we do both in Grad{1} and record the results in s to be later retrieved by other Grad calls.

TODO:

denizyuret commented 6 years ago

The return of hy, cy (and in our case sometimes y) is optional. By not returning these we can save some time and memory. I propose defining versions of rnn which do not return these (and send C_NULL to the cuda calls), how about:

rnn3(...) => (y,hy,rs)
rnn2(...) => (y,rs)
cangumeli commented 6 years ago

There are some cudnn datatypes (Filter and Tensor Descriptors) used in both rnn and cnn implementations. Should we refactor them into a new cudnn.jl file?

denizyuret commented 6 years ago

Sure. Sounds good.

On Mon, Nov 6, 2017, 01:53 cangumeli notifications@github.com wrote:

There are some cudnn datatypes (Filter and Tensor Descriptors) used in both rnn and cnn implementations. Should we refactor them into a new cudnn.jl file?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/denizyuret/Knet.jl/issues/177#issuecomment-342012790, or mute the thread https://github.com/notifications/unsubscribe-auth/ABvNpiHJV4TGO_hZIdfTcXJlGS6mS9T5ks5szjxygaJpZM4QAafE .

denizyuret commented 6 years ago

Su an calisan bir implementation oldu src/rnn.jl altinda. Test script ile yaptigi isi Julia'da replicate etmeyi basardim test/rnn.jl altinda. Kalan isler:

denizyuret commented 6 years ago

OK, rnn and batchnorm are done. Softmax and dropout are next to benchmark and integrate if they are worth it.

denizyuret commented 6 years ago

I integrated softmax using it for logp. prof/softmax.jl shows about double the speed.

denizyuret commented 6 years ago

I think dropout and bias-add are the next likely candidates to improve speed. Could also test activation functions.