Bihaqo / t3f

Tensor Train decomposition on TensorFlow
https://t3f.readthedocs.io/en/latest/index.html
MIT License
219 stars 55 forks source link

Mix core shapes to avoid having too many indices #115

Open AustenLamacraft opened 6 years ago

AustenLamacraft commented 6 years ago

I would like to contract all but a few indices in a TT. At the moment, I am doing this by defining TT matrices of shape ((1,1,1,..., n_ext), (m_1, ..., m_N)), and then contracting the m indices (with t3f.matmul()) with some other TT matrix.

The problem is that I'm interested in large N, so that I end up getting:

2018-03-08 14:55:41.571939: F tensorflow/core/framework/tensor_shape.cc:243] Check failed: ndims_byte() < MaxDimensions() (unsigned char value 254 vs. 254)Too many dimensions in tensor

It would be nice to avoid this by allowing a TT to have both 3d and 4d cores. t3f.full() could still give a matrix. t3f.matmul() would require the 4d cores to be in the same locations.

For similar reasons, it would be good to be able to have genuine matrix-vector multiplication. At the moment I do this with a shape = ((m_1, .... m_N), (1,1,....)) TT matrix, also contributing to the "too many dimensions" issue.

AustenLamacraft commented 6 years ago

A couple more thoughts.

  1. I see the problem with mixing cores is that we don't know whether the middle index of a 3d core is a row or column index of the matrix.

  2. If the second issue (multiply matrix and vector) can be overcome, I guess one solution to the first issue would be to do t3f.full(a[0,0,...,:]). This won't work if a is a matrix, as __getitem__ isn't implemented. Incidentally, a[0,0,....,0] (setting all indices) doesn't seem to work -- one of them has to be a slice.

Bihaqo commented 6 years ago

Can you please provide a small reproducible example? Because I'm not exactly sure I got you correctly.

By "contacting all indexes but a few" you mean something like Y[i_3, i5] = sum{i_1,i_2,i_4} X[i_1,...,i_5] ?

I can implement it as an op if you need.

If you still need to multiply a tensor by a matrix I can provide something like t3f.to_tt_vector to covert between representations, but I'm still not sure about the use case.

For the problem of indexing tt matrices I haven't implemented it because I was not sure about the API. Should it be something like M[i_1,...,i_d,j_1,...,j_d] or M[I, j] where I and j encode the indices of the row and col of the underlying matrix? What to do in the batch case?

If you have ideas about the right API here it is very easy to implement.

AustenLamacraft commented 6 years ago

I'm implementing a TT based classifier similar to Stoudenmire and Schwab's paper and also yours. At the moment I am implementing f^l(x) = W^l Phi(x) in the following way

  1. W is a tt matrix of shape ((1,1,1...L),(F,F,F...)), where L is the number of labels and F is the number of features.

  2. Phi is a tt matrix of shape ((F,F,...), (1,1,...)) with rank one cores.

  3. I use t3f.matmul to contract them together, yielding a tt matrix of shape ((1,1,...,L), (1,1,....)).

This is where the "too many dimensions" problem rears its head. In an ideal world W could have shape ((None, None, ... ,L), (F,F....)) and Phi would be have shape (F,F,...) and I would just contract the feature indices, yielding a single index.

KhrulkovV commented 6 years ago

Actually this is easier to implement using the TensorTrainBatch class, since you can compute the matrix of pairwise inner products. I've written up a small example here.

AustenLamacraft commented 6 years ago

Thanks for this! Two comments:

  1. It seems you've used the batch structure for the labels. Does that mean I've lost the possibility to use it for batches?

  2. (More seriously) You now have a completely new TT for each label, whereas in my example it was only the last core of the TT that knew about the label.

AustenLamacraft commented 6 years ago

There was an issue on the tensorflow repo about the merits of allowing genuine matrix vector multiplication.

https://github.com/tensorflow/tensorflow/issues/9055

Bihaqo commented 6 years ago
  1. You still can use a batch of objects. pairwise_flat_inner(x, w) is a matrix of cross products of size x.batch_size x w.batch_size
  2. Good point, I see. And where do you get the MaxDimensions error from? I just tried to multiply t3f.random_matrix([[2]256, [1]256]) by itself transpose and it works fine.
Bihaqo commented 6 years ago

56

AustenLamacraft commented 6 years ago

I added matrix vector multiplication to by fork here.

AustenLamacraft commented 6 years ago

The too many dimensions error appears when you take t3f.full

Bihaqo commented 6 years ago

Wow, you put in a lot of work!

However, I intentionally didn't want to mix matrix/vector objects with tensors because I heard that this mixing was one of the major sources of confusion for TTPY framework ("why a matrix is a matrix and a vector is a tensor??").

How do you think, will t3f.vector_to_tensor and t3f.tensor_to_vector solve your problems? So you would be able to t3f.full(t3f.vector_to_tensor(t3f.matmul(A, b))).