Closed woshihuangshuai closed 7 years ago
Hi! Socher deals only with binary trees but this layer also works on a single input. The model was introduced for binary trees, but actually works really well with just normal networks.
If you're building a binary tree like Socher, concatenate the two inputs before this layer. My layer will include a few extra interaction terms but overall should be about the same. If you concatenate two layers, so x=[e_1,e_2], my formula factors to the below, which is basically the same as the original with a few extra terms for interactions within an input:
f_i = a(e_1V1_ie_2^T + W_i*(e_1,e_2)^T + b_i + e1V1e1 + e2V2e2).
I've also tried experimenting with some models not in the original paper, like limiting V to be low-rank. You can try those layers in place of ordinary dense layers, so a single layer can model complicated interaction effects. Haven't had a chance to write something up but I think there is something worth while in there.
I'll warn you that V is input times output times input params. If the model gets huge, use the low-rank versions of the layer, which I haven't had any memory issues with.
Cheers
You can reproduce Socher's model exactly if you use a V_constraint. Ask around on keras if you need help figuring out how to write the constraint.
Imagine in block notation:
xVx^T = e1v1e1^T + e1v2e2^T + e2v3v1^T + e2v4e2^T = e1v1e1^T + e1(v2+v3^T)e2^T + e2v4e2^T
So if you constrain v1 and v4 to be 0, you are left with e1(v2+v3^T)e2^T. You could also constrain v1, v3 and v4 to be 0, and you are left with e1v2e2^T.
I don't think having the extra terms will hurt the model, but if you want to reproduce Socher's model exactly, constrain V to have 0s on the diagonal blocks.
The most intriguing part of the model is that it uses quadratic terms instead of linear terms. That goes far beyond just a model for binary recursive trees.
It is actually a really interesting layer you could use for any type of ML, but I haven't seen a lot of exploration. If I ever get some free time I might try to do some more experimentation. Please let me know if you find anything useful.
If you write a constraint to reproduce Socher's model or any interesting examples, please let me know and I can add it to this repo or link to it.
Hi! Thank you very much for your answer! It's helpful for me. I'm new to Keras and Neural Tensor Layer, so there are a lot of things I need to learn. I'll read your code again to figure it out.
Hello, I have some question about the formula in your program
Dense Tensor Layer: f_i = a( xV_ix^T + W_ix^T + b_i)
Your formula is different of socher's, What is x represent for in above formula?
In socher's paper, The formula should be f_i = a( e_1V_ie_2^T + W_i*(e_1,e_2)^T + b_i).