Backprop a decomposed place-held batched order-3 tensor getting SVD errors

Bihaqo / t3f

Tensor Train decomposition on TensorFlow

https://t3f.readthedocs.io/en/latest/index.html

MIT License

220 stars 55 forks source link

Backprop a decomposed place-held batched order-3 tensor getting SVD errors #179

Closed mabounassif closed 5 years ago

mabounassif commented 5 years ago

I'm trying to backprop a decomposed order-4 tensor with the first dimension defined as a tf.placeholder (changing batch size). I'm playing with a modified dynamic_rnn and the backprop fails with the following error:

NotImplementedError: SVD gradient has not been implemented for input with unknown inner matrix shape.

What's the best way to deal with a similar issue?

THANKS!

mabounassif commented 5 years ago

Looking closer at this. Seems like a TensorFlow issue with SVD gradients.

theWorldCreator commented 5 years ago

Yes, it probably is TensorFlow related. But may I ask why do you need SVDs in the context of RNNs? It sounds a bit unusual (which is not a bad thing BTW).

mabounassif commented 5 years ago

I'm extending the following model: https://github.com/ischlag/TPR-RNN to optimize for memory with TT-decomp. It's tougher than I initially anticipated.

I'm trying to decompose the TPR which is the hidden layer of the RNN.

mabounassif commented 5 years ago

I went so far as to stop_gradients of the SVD and I'm successfully creating the graph. However, there's a miss match of the tt-train cores (which have become the hidden state) and the batch size of the sequence_length of the dynamic_rnn.

Bihaqo commented 5 years ago

But why do you even need SVD? Why can't you just generate a random third-order tensor in the TT format and then train it?

As for the shape mismatch, I don't think the TT-train cores should depend on the batch size, if you see a mismatch you are probably multiplying something in a wrong way. Can you share a code example?

mabounassif commented 5 years ago

SVD is performed in the to_tt_tensor method. Back-propagating through that won't work. This might break at the creation of the graph, I'm running it on an experimental forked version of t3f locally. https://github.com/mabounassif/TPR-RNN/blob/share/tpr_rnn_graph.py

Bihaqo commented 5 years ago

That's what I thought: you probably don't need SVD (aka to_tt_tensor). In the code, you create a tensor of zeros and then convert it to the TT format, but it's going to be much faster / more differentiable to create this tensor directly in the TT-format (i.e. directly define TT-cores which yield a tensor of zeros): t3f.tensor_zeros().

tensor_zeros doesn't support batches of TT-tensors, but it's just because no-one needed it yet, it's super easy to implement.

mabounassif commented 5 years ago

Yeah, however in the RNN cell, there are a bunch of einsum operations for the dense tensor. I'm not sure I know how they concert in tt format

mabounassif commented 5 years ago

Unless I convert them to matmul operations

Bihaqo commented 5 years ago

Maybe try instead of

h = tf.zeros((4, 4, 4))
rnn(h)

use something like

h_tt = t3f.tensor_zeros((4, 4, 4), tt_rank=4)
h = t3f.full(h_tt)
rnn(h)

Bihaqo commented 5 years ago

If I understood correctly what you want