[WIP] Implement cuDNN RNN ops and RNN module.

feiwang3311 / Lantern

BSD 3-Clause "New" or "Revised" License

167 stars 15 forks source link

This PR is a WIP, created to show progress and receive feedback. Please don't merge yet.

[x] Implement and test wrappers for cudnnRNNForwardInference and cudnnRNNForwardTraining.
[x] Implement and test cuDNN RNN module.
- The Rnn module is designed to match PyTorch's recurrent layers, e.g. nn.RNN and nn.LSTM. It facilitates PyTorch-style model building.
- Extract individual parameter tensors from single weight buffer.
- cuDNN requires that all weights are contiguously allocated and passed as a single buffer. However, extracting individual parameter tensors is important for gradient descent, etc.
[ ] Implement and test cuDNN RNN backward functions.
- [x] cudnnRNNBackwardData seems to work.
- [ ] cudnnRNNBackwardWeights needs debugging, the values are incorrect.
- [ ] Update Module.registerParameters to correctly register arrays of parameters (ArrayBuffer[TensorR]).

import torch import torch.nn as nn import torch.nn.init as init import torch.nn.functional as F input_size = 10 hidden_size = 40 num_layers = 2 seq_length = 5 batch_size = 3 bidirectional = False num_directions = 2 if bidirectional else 1 rnn = nn.RNN(input_size, hidden_size, num_layers, nonlinearity='relu', bias=True, dropout=0, bidirectional=bidirectional) for p in rnn.parameters(): init.constant_(p, 0.01) input = torch.ones(seq_length, batch_size, input_size) input.requires_grad = True h0 = torch.ones(num_layers * num_directions, batch_size, hidden_size) output, hn = rnn(input, h0) # Lantern produces the same output value. output.backward(torch.ones_like(output)) print(input.grad) # Lantern produces the same input gradient value. for p in rnn.parameters(): print(p.grad) # Lantern doesn't produce the same parameter gradient values yet.

feiwang3311 / Lantern

[WIP] Implement cuDNN RNN ops and RNN module. #44