Closed asydorchuk closed 5 years ago
You have to retrain with standard Tensorflow rnn cells and with dtype=tf.float32
Hi Boris, thanks for the quick reply!
From the tensorflow docs here:
Cudnn RNNs have two major differences from other platform-independent RNNs tf
provides:
* Cudnn LSTM and GRU are mathematically different from their tf counterparts.
(e.g. `tf.contrib.rnn.LSTMBlockCell` and `tf.nn.rnn_cell.GRUCell`.
* Cudnn-trained checkpoints are not directly compatible with tf RNNs:
* They use a single opaque parameter buffer for the entire (possibly)
multi-layer multi-directional RNN; Whereas tf RNN weights are per-cell and
layer.
* The size and layout of the parameter buffers may change between
CUDA/CuDNN/GPU generations. Because of that, the opaque parameter variable
does not have a static shape and is not partitionable. Instead of using
partitioning to alleviate the PS's traffic load, try building a
multi-tower model and do gradient aggregation locally within the host
before updating the PS. See https://www.tensorflow.org/performance/performance_models#parameter_server_variables
for a detailed performance guide.
Consequently, if one plans to use Cudnn trained models on both GPU and CPU
for inference and training, one needs to:
* Create a CudnnOpaqueParamsSaveable subclass object to save RNN params in
canonical format. (This is done for you automatically during layer building
process.)
* When not using a Cudnn RNN class, use CudnnCompatibleRNN classes to load the
checkpoints. These classes are platform-independent and perform the same
computation as Cudnn for training and inference.
Similarly, CudnnCompatibleRNN-trained checkpoints can be loaded by CudnnRNN
classes seamlessly.
Below is a typical workflow(using LSTM as an example):
for detailed performance guide.
# Use Cudnn-trained checkpoints with CudnnCompatibleRNNs
```python
with tf.Graph().as_default():
lstm = CudnnLSTM(num_layers, num_units, direction, ...)
outputs, output_states = lstm(inputs, initial_states, training=True)
# If user plans to delay calling the cell with inputs, one can do
# lstm.build(input_shape)
saver = Saver()
# training subgraph
...
# Once in a while save the model.
saver.save(save_path)
# Inference subgraph for unidirectional RNN on, e.g., CPU or mobile.
with tf.Graph().as_default():
single_cell = lambda: tf.contrib.cudnn_rnn.CudnnCompatibleLSTM(num_units)
# NOTE: Even if there's only one layer, the cell needs to be wrapped in
# MultiRNNCell.
cell = tf.nn.rnn_cell.MultiRNNCell(
[single_cell() for _ in range(num_layers)])
# Leave the scope arg unset.
outputs, final_state = tf.nn.dynamic_rnn(cell, inputs, initial_state, ...)
saver = Saver()
# Create session
sess = ...
# Restores
saver.restore(sess, save_path)
# Inference subgraph for bidirectional RNN
with tf.Graph().as_default():
single_cell = lambda: tf.contrib.cudnn_rnn.CudnnCompatibleLSTM(num_units)
cells_fw = [single_cell() for _ in range(num_layers)]
cells_bw = [single_cell() for _ in range(num_layers)]
# Leave the scope arg unset.
(outputs, output_state_fw,
output_state_bw) = tf.contrib.rnn.stack_bidirectional_dynamic_rnn(
cells_fw, cells_bw, inputs, ...)
saver = Saver()
# Create session
sess = ...
# Restores
saver.restore(sess, save_path)
CudnnCompatibleGRUCell
is used in ds2_encoder.py
when "use_cudnn_rnn": False
. Based on the link above it seems possible to transform weights from the checkpoint trained with "use_cudnn_rnn": True
somehow. But I am not sure that all the components used are compatible.
Do you confirm retraining as the only viable solution?
Maybe there are other ways, but I am not aware of them :).
On Fri, Jan 11, 2019 at 10:25 AM Andrii Sydorchuk notifications@github.com wrote:
Hi Boris, thanks for the quick reply!
From the tensorflow docs here https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/cudnn_rnn/python/layers/cudnn_rnn.py#L73 :
Cudnn RNNs have two major differences from other platform-independent RNNs tf provides:
- Cudnn LSTM and GRU are mathematically different from their tf counterparts. (e.g.
tf.contrib.rnn.LSTMBlockCell
andtf.nn.rnn_cell.GRUCell
.- Cudnn-trained checkpoints are not directly compatible with tf RNNs:
- They use a single opaque parameter buffer for the entire (possibly) multi-layer multi-directional RNN; Whereas tf RNN weights are per-cell and layer.
- The size and layout of the parameter buffers may change between CUDA/CuDNN/GPU generations. Because of that, the opaque parameter variable does not have a static shape and is not partitionable. Instead of using partitioning to alleviate the PS's traffic load, try building a multi-tower model and do gradient aggregation locally within the host before updating the PS. See https://www.tensorflow.org/performance/performance_models#parameter_server_variables for a detailed performance guide. Consequently, if one plans to use Cudnn trained models on both GPU and CPU for inference and training, one needs to:
- Create a CudnnOpaqueParamsSaveable subclass object to save RNN params in canonical format. (This is done for you automatically during layer building process.)
- When not using a Cudnn RNN class, use CudnnCompatibleRNN classes to load the checkpoints. These classes are platform-independent and perform the same computation as Cudnn for training and inference. Similarly, CudnnCompatibleRNN-trained checkpoints can be loaded by CudnnRNN classes seamlessly. Below is a typical workflow(using LSTM as an example): for detailed performance guide.
Use Cudnn-trained checkpoints with CudnnCompatibleRNNs
with tf.Graph().as_default(): lstm = CudnnLSTM(num_layers, num_units, direction, ...) outputs, output_states = lstm(inputs, initial_states, training=True) # If user plans to delay calling the cell with inputs, one can do # lstm.build(input_shape) saver = Saver() # training subgraph ... # Once in a while save the model. saver.save(save_path) # Inference subgraph for unidirectional RNN on, e.g., CPU or mobile. with tf.Graph().as_default(): single_cell = lambda: tf.contrib.cudnn_rnn.CudnnCompatibleLSTM(num_units) # NOTE: Even if there's only one layer, the cell needs to be wrapped in # MultiRNNCell. cell = tf.nn.rnn_cell.MultiRNNCell( [single_cell() for _ in range(num_layers)]) # Leave the scope arg unset. outputs, final_state = tf.nn.dynamic_rnn(cell, inputs, initial_state, ...) saver = Saver() # Create session sess = ... # Restores saver.restore(sess, save_path) # Inference subgraph for bidirectional RNN with tf.Graph().as_default(): single_cell = lambda: tf.contrib.cudnn_rnn.CudnnCompatibleLSTM(num_units) cells_fw = [single_cell() for _ in range(num_layers)] cells_bw = [single_cell() for _ in range(num_layers)] # Leave the scope arg unset. (outputs, output_state_fw, output_state_bw) = tf.contrib.rnn.stack_bidirectional_dynamic_rnn( cells_fw, cells_bw, inputs, ...) saver = Saver() # Create session sess = ... # Restores saver.restore(sess, save_path)
CudnnCompatibleGRUCell is used in ds2_encoder.py when "use_cudnn_rnn": False. Based on the link above it seems possible to transform weights from the checkpoint trained with "use_cudnn_rnn": True somehow. But I am not sure that all the components used are compatible.
Do you confirm retraining as the only viable solution?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/NVIDIA/OpenSeq2Seq/issues/335#issuecomment-453611182, or mute the thread https://github.com/notifications/unsubscribe-auth/AHMWqRoCHNY1TAx5qigbI8vgI8WZT9JMks5vCNcZgaJpZM4Z70V6 .
Understood! Thanks for looking into it. Kudos for the library!
As a follow up I could load CudnnGRU
trained weights on CPU by:
1) changing use_cudnn_rnn
to False
in the config;
2) replacing bidirectional_dynamic_rnn
and MultiRNNCell
with stack_bidirectional_dynamic_rnn
inside ds2_encoder.py
;
3) adding tf.variable_scope("cudnn_gru")
before constructing RNN layers in ds2_encoder.py
;
The steps above generate tensorflow graph that is compatible with CudnnGRU
.
Relevant gist that was helpful: https://gist.github.com/melgor/41e7d9367410b71dfddc33db34cba85f
@asydorchuk : Thank you for sharing you workaround. Wondering if you can share a bit more information on your step #3?
tf.variable_scope("cudnn_gru")
before constructing RNN layers in ds2_encoder.py
Where exactly in ds2_encoder.py
do you add tf.variable_scope("cudnn_gru")
? Would you mind sharing your working code?
To be more specific, I got the following error after following your 3 steps:
NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
Key ForwardPass/ds2_encoder/bidirectional_rnn/bw/multi_rnn_cell/cell_0/cudnn_compatible_gru_cell/candidate/hidden_projection/bias not found in checkpoint
[[node save/RestoreV2 (defined at <ipython-input-5-4444675cc8c1>:15) ]]
I am doing transfer learning on top of
speech2text
model defined inds2_large_8gpus_mp.py
. The trained model works perfect on the machine with GPU however doesn't run on CPU-only machine since CudnnGRU/LSTM support only GPU.Is there any way to parse pretrained
CuddnGRU
layer weights intoMultiRNNCell + CudnnCompatibleGRUCell
? I tried setting the flaguse_cudnn_rnn: False
during inference time (with idea that layers are compatible) and followingInteractive_Infer_example.ipynb
example, however get the key error regarding rnn related layers in the checkpoint: