[Question] Adding a span representation to all tokens in the span

nitishgupta commented 6 years ago

I have Tensor T = [B, T, D] of contextual word embeddings for a given piece of text. Alongside, I have a tensor S = [B, M, 2] of M-spans in this text and their representations, i.e. a tensor R = [B, M, D]. For any given span (i, j), I want to add it's representation to all the tokens in span [i : j].

Currently the solution I have is to loop over dim-1 in S (and R), and for each span, repeat the span representation to the length of the span, and add it to the corresponding slice in T.

I was wondering if somebody has a better solution in mind.

matt-gardner commented 6 years ago

Just making sure I understand the problem:

Let's say I have the sentence "The cat ate the mouse", and I have four spans: "the cat", "the cat ate", "the mouse", and "the cat ate the mouse". So each word will get either two or three span representations added to it. Yes? Something like this:

SPAN    0   1   2   3
----------------------
the     X   X       X
cat     X   X       X
ate         X       X
the             X   X
mouse           X   X

Right? What this suggests is that you want to construct a (B, T, M) shape binary tensor, which you can then multiply by your (B, M, D) tensor to get something of shape (B, T, D) that you can add to your token representations. You should be able to construct that (B, T, M) tensor using a range vector (for the token indices) and some greater than / less than operations.

Does this make sense? (I'll add that doing this seems a little bit odd to me, but hey, maybe the model can segregate the span features into one part of the feature space, and have them be additive...)

nitishgupta commented 6 years ago

Yes, you understood the problem correctly and the solution seems correct. Thanks!

Sidenote: The actual modeling in mind is different from this, but I simplified it for the sake of easy explanation. Still, is having additive features a bad idea?

matt-gardner commented 6 years ago

I'm glad it helped! I'm closing this issue now.

And additive features aren't necessarily bad - the model can just partition the feature space so that each feature gets its own bucket, and things can work out. I've been surprised at how well it works in a number of different occasions (e.g., positional embeddings). I wouldn't trust my intuitions on this point very much - much easier to just try something and see empirically if it's a good idea.

nitishgupta commented 6 years ago

I implemented a test version of this and it works. Thanks again for the idea.

allenai / allennlp

[Question] Adding a span representation to all tokens in the span #1782