Open denizyuret opened 7 years ago
To implement RNNs from CUDNN we can follow the pattern in conv.jl. (see rnn.jl for evolving code and RNN_example.cu and http://docs.nvidia.com/deeplearning/sdk/cudnn-developer-guide from NVIDIA as references).This means we don't keep around the cudnn descriptors, just create them on the fly whenever they are needed. Thus it is important to keep them lightweight. (Although we can keep them around in the state variable). Questions:
Current design for the primitive rnn operation:
rnn(w,x,hx,cx,s; training=false) => (y,hy,cy,rs)
# hx,cx can be nothing
# s keeps mostly read-only info like numLayers
# rs is reserveSpace for the back functions, nothing for inference
# training determines whether inference or training is called
rnn_r=recorder(rnn)
rnn(w::Rec, x, hx, cx, s)=rnn_r(w,x,hx,cx,s; training=true)
# we assume w::Rec means we are training and call the recorder version
rnn(::Type{Grad{1}},dr,r,w,x,hx,cx,s)=((y,hy,cy,rs)=r; (dy,dhy,dcy,drs)=dr; backData(); backWeights(); set s.dx, s.dhx, s.dcx; return dw)
rnn(::Type{Grad{2}},dr,r,w,x,hx,cx,s)=s.dx
rnn(::Type{Grad{3}},dr,r,w,x,hx,cx,s)=s.dhx
rnn(::Type{Grad{4}},dr,r,w,x,hx,cx,s)=s.dcx
# we always need to call backData before backWeights. Here we do both in Grad{1} and record the results in s to be later retrieved by other Grad calls.
TODO:
The return of hy, cy (and in our case sometimes y) is optional. By not returning these we can save some time and memory. I propose defining versions of rnn which do not return these (and send C_NULL to the cuda calls), how about:
rnn3(...) => (y,hy,rs)
rnn2(...) => (y,rs)
There are some cudnn datatypes (Filter and Tensor Descriptors) used in both rnn and cnn implementations. Should we refactor them into a new cudnn.jl file?
Sure. Sounds good.
On Mon, Nov 6, 2017, 01:53 cangumeli notifications@github.com wrote:
There are some cudnn datatypes (Filter and Tensor Descriptors) used in both rnn and cnn implementations. Should we refactor them into a new cudnn.jl file?
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/denizyuret/Knet.jl/issues/177#issuecomment-342012790, or mute the thread https://github.com/notifications/unsubscribe-auth/ABvNpiHJV4TGO_hZIdfTcXJlGS6mS9T5ks5szjxygaJpZM4QAafE .
Su an calisan bir implementation oldu src/rnn.jl altinda. Test script ile yaptigi isi Julia'da replicate etmeyi basardim test/rnn.jl altinda. Kalan isler:
OK, rnn and batchnorm are done. Softmax and dropout are next to benchmark and integrate if they are worth it.
I integrated softmax using it for logp. prof/softmax.jl shows about double the speed.
I think dropout and bias-add are the next likely candidates to improve speed. Could also test activation functions.
On a high level this is an interface that would be nice to have for rnns:
state can be used to encapsulate various things, such as:
input dimensionality can be:
Question: