bstriner / dense_tensor

Dense Tensor Layer for Keras
MIT License
9 stars 1 forks source link

Question about Dense Tensor Layer: f_i = a( xV_ix^T + W_ix^T + b_i). #1

Closed woshihuangshuai closed 7 years ago

woshihuangshuai commented 7 years ago

Hello, I have some question about the formula in your program

Dense Tensor Layer: f_i = a( xV_ix^T + W_ix^T + b_i)

Your formula is different of socher's, What is x represent for in above formula?

In socher's paper, The formula should be f_i = a( e_1V_ie_2^T + W_i*(e_1,e_2)^T + b_i).

bstriner commented 7 years ago

Hi! Socher deals only with binary trees but this layer also works on a single input. The model was introduced for binary trees, but actually works really well with just normal networks.

If you're building a binary tree like Socher, concatenate the two inputs before this layer. My layer will include a few extra interaction terms but overall should be about the same. If you concatenate two layers, so x=[e_1,e_2], my formula factors to the below, which is basically the same as the original with a few extra terms for interactions within an input:

f_i = a(e_1V1_ie_2^T + W_i*(e_1,e_2)^T + b_i + e1V1e1 + e2V2e2).

I've also tried experimenting with some models not in the original paper, like limiting V to be low-rank. You can try those layers in place of ordinary dense layers, so a single layer can model complicated interaction effects. Haven't had a chance to write something up but I think there is something worth while in there.

I'll warn you that V is input times output times input params. If the model gets huge, use the low-rank versions of the layer, which I haven't had any memory issues with.

Cheers

bstriner commented 7 years ago

You can reproduce Socher's model exactly if you use a V_constraint. Ask around on keras if you need help figuring out how to write the constraint.

Imagine in block notation:

x = [e1, e2]

V = [[v1, v2],[v3, v4]]

xVx^T = e1v1e1^T + e1v2e2^T + e2v3v1^T + e2v4e2^T = e1v1e1^T + e1(v2+v3^T)e2^T + e2v4e2^T

So if you constrain v1 and v4 to be 0, you are left with e1(v2+v3^T)e2^T. You could also constrain v1, v3 and v4 to be 0, and you are left with e1v2e2^T.

I don't think having the extra terms will hurt the model, but if you want to reproduce Socher's model exactly, constrain V to have 0s on the diagonal blocks.

The most intriguing part of the model is that it uses quadratic terms instead of linear terms. That goes far beyond just a model for binary recursive trees.

It is actually a really interesting layer you could use for any type of ML, but I haven't seen a lot of exploration. If I ever get some free time I might try to do some more experimentation. Please let me know if you find anything useful.

If you write a constraint to reproduce Socher's model or any interesting examples, please let me know and I can add it to this repo or link to it.

woshihuangshuai commented 7 years ago

Hi! Thank you very much for your answer! It's helpful for me. I'm new to Keras and Neural Tensor Layer, so there are a lot of things I need to learn. I'll read your code again to figure it out.