Closed mabounassif closed 5 years ago
Looking closer at this. Seems like a TensorFlow issue with SVD gradients.
Yes, it probably is TensorFlow related. But may I ask why do you need SVDs in the context of RNNs? It sounds a bit unusual (which is not a bad thing BTW).
I'm extending the following model: https://github.com/ischlag/TPR-RNN to optimize for memory with TT-decomp. It's tougher than I initially anticipated.
I'm trying to decompose the TPR which is the hidden layer of the RNN.
I went so far as to stop_gradients
of the SVD and I'm successfully creating the graph. However, there's a miss match of the tt-train cores (which have become the hidden state) and the batch size of the sequence_length of the dynamic_rnn
.
But why do you even need SVD? Why can't you just generate a random third-order tensor in the TT format and then train it?
As for the shape mismatch, I don't think the TT-train cores should depend on the batch size, if you see a mismatch you are probably multiplying something in a wrong way. Can you share a code example?
SVD is performed in the to_tt_tensor
method. Back-propagating through that won't work. This might break at the creation of the graph, I'm running it on an experimental forked version of t3f locally.
https://github.com/mabounassif/TPR-RNN/blob/share/tpr_rnn_graph.py
That's what I thought: you probably don't need SVD (aka to_tt_tensor). In the code, you create a tensor of zeros and then convert it to the TT format, but it's going to be much faster / more differentiable to create this tensor directly in the TT-format (i.e. directly define TT-cores which yield a tensor of zeros): t3f.tensor_zeros()
.
tensor_zeros
doesn't support batches of TT-tensors, but it's just because no-one needed it yet, it's super easy to implement.
Yeah, however in the RNN cell, there are a bunch of einsum operations for the dense tensor. I'm not sure I know how they concert in tt format
Unless I convert them to matmul operations
Maybe try instead of
h = tf.zeros((4, 4, 4))
rnn(h)
use something like
h_tt = t3f.tensor_zeros((4, 4, 4), tt_rank=4)
h = t3f.full(h_tt)
rnn(h)
?
If I understood correctly what you want
I'm trying to backprop a decomposed order-4 tensor with the first dimension defined as a
tf.placeholder
(changing batch size). I'm playing with a modifieddynamic_rnn
and the backprop fails with the following error:What's the best way to deal with a similar issue?
THANKS!