Open Henry-Chinner opened 8 years ago
For the record, It is possible to build GRU/LSTM with vanilla NumPy/CUDArrray.
You are right, it would be nice to have optimized versions of these functions. Though, I think it is equally important to implement parallel array operations as you discuss in issue #43. This should allow us to take advantage of model parallelism in recurrent networks.
Oh yes, I have implemented GRU/LSTMs with Cudarray and it works quite satisfactory.
I agree with you, parallelization of array ops will be a massive help for LSTM/GRU's. Especially when they are stacked.
Maybe optimized implementations of LSTM/GRU blocks it a too high level of abstraction for Cudarray.
Maybe optimized implementations of LSTM/GRU blocks it a too high level of abstraction for Cudarray.
There is always plenty of room in the cudarray.nnet
submodule. :)
, but I get your point. Ideally, CUDArray should be easier to extend with external modules. One could then imagine a cuDNN module that exposes the operations with CUDArray compatibiltiy.
Cudnn 5 now supports GRU and LSTM cells natively, with speedup's of up to 6 times compared to a Cublass implementation of GRU's and LSTMs. IT would be great if Cudarray could support this.
https://devblogs.nvidia.com/parallelforall/optimizing-recurrent-neural-networks-cudnn-5/