Open miladdona opened 3 years ago
Hi
A few things
1) I had reasons to call max_tt_rank
and tt_rank
differently, but now that you questioned it, I realised that those reasons were never convincing enough and you're totally right, they should have the same name (tt_rank
)
2) You hit a frequent problem common to many TT codebases when your TT rank is bigger than the theoretical maximally useful TT-rank. TT-rank is actually a list, when you define it with a number 10
it gets silently converted into list (1, 10, 10, 10, 1)
for you (the list has 5 elements because your underlying tensor is 4 dimensional; it aways have 1 as the first and last element). The second of those TT-ranks is redundantly big. You can change the code to
tt_layer = t3f.nn.KerasDense(input_dims=[2, 2, 2, 567], output_dims=[2, 2, 5, 5], tt_rank=(1, 4, 10, 10, 1), activation='relu')
and I believe it should work
3) Actually, I wouldn't recommend using such an inbalanced tensor shape. Very likely you would be better off to pad your input size 4536 to e.g. 5000 and then use input_dims = (10, 10, 10, 5)
or something like this. This would also fix you previous problem: with a more balanced shape, the TT-rank 10 should work out of the box.
4) Also note that TT-layer might be sensitive to the order of inputs and outputs, i.e. it might work a lot worse if you shuffle your output dimensions. It is not a problem if the layer is in the middle of an MLP (because the surrounding dense layers can provide features in any order that is useful for your TT-layer), but it might be problematic if using the TT-layer as the last layer, since the order of outputs would be defined by the (arbitrary) order of your labels. TLDR: if this is the last layer in your network, I would try to also add yet another dense layer on top of it of size 100 x 100
.
Thanks. Is there a way to find list of tt_rank? I mean how did find the tt_rank=(1, 4, 10, 10, 1)? Did you try with running or did you find it with some equations and relations?
So the idea is that if your input dims are [a1, a2, a3]
and your output dims are [b1, b2, b3]
, then your TT-ranks should be smaller than np.minimum([1, a1*b1, a1*b1*a2*b2, a1*b1*a2*b2*a3*b3], [a1*b1*a2*b2*a3*b3, a2*b2*a3*b3, a3*b3, 1])
.
In this case it's np.minimum([1, 4, 16, 160, 453600], [453600, 113400, 28350, 2835, 1]) = [1, 4, 16, 160, 1]
.
Hi guys,
I have a simple model and I want to apply T3F library on a dense layer of the model with shape of (4536, 100). There are different combination but I want to use this [[2, 2, 2, 567], [2, 2, 5, 5]] and define rank as 10.
Wtt = t3f.to_tt_matrix(W, shape=[[2, 2, 2, 567], [2, 2, 5, 5]], max_tt_rank=10) tt_layer = t3f.nn.KerasDense(input_dims=[2, 2, 2, 567], output_dims=[2, 2, 5, 5], tt_rank=10, activation='relu')
But after running I get this error: ValueError: Layer weight shape (1, 2, 2, 20) not compatible with provided weight shape (1, 2, 2, 4)
I think this is related to the max_tt_rank in the first statement and tt_rank in the second statement. I want to know what is different between them and how can I control this?
Thanks.