Odd behavior in FixedStackInOrderRNNG

aistairc / rnng-pytorch

MIT License

20 stars 7 forks source link

Odd behavior in FixedStackInOrderRNNG #3

Closed ekayen closed 3 years ago

ekayen commented 3 years ago

I have been experimenting with the FixedStackInOrderRNNG and have run into some behavior that is hard to explain. I have found that I can change the input dimension of vocab_mlp() (defined on line 711 of fixed_stack_models.py) to be pretty much anything and the forward pass of the model will run without issue, even if the input to vocab_mlp() has the wrong dimension to act as input to vocab_mlp(). Do you get this behavior as well? Do you happen to know what might cause it?

I uncovered this behavior because I was trying to concatenate new features to the text embeddings and the backward pass was consistently giving dimension errors, even though I had verified that the forward pass had correct dimensions everywhere. I still haven't been able to get this modification to the code up and running, and so I'm hoping that if you know why there seems to be no dimension checking in original model in the forward pass, that might give a clue as to why my modifications will not work.

ekayen commented 3 years ago

For concreteness, I have set the input dim of vocab_mlp() to 10:

Linear(in_features=10, out_features=8772, bias=True)

And the input that it takes has dimensions [7752, 105] -- and pytorch doesn't complain.

hiroshinoji commented 3 years ago

That might be due to the input and output embedding tying. https://github.com/aistairc/rnng-pytorch/blob/f30c0ed56949a3a30c50dc9bac16de362bbbdc92/fixed_stack_models.py#L717 This technique of sharing weights is known to improve the performance of LMs (https://openreview.net/forum?id=r1aPbsFle). Here regardless of the original dimension of vocab_mlp, its weights are overwritten by the embedding matrix weights.

I think this feature can be optional, by adding a new bool argument like tie_embeddings and then change this line by

if tie_embeddings:
  self.vocab_mlp.weight = self.emb[0].weight

Then, vocab_mlp and emb are not tied and each would be flexibly changed.

But I think what you intend to do is to extend the input word embeddings, which is self.emb, not vocab_mlp. vocab_mlp is the final output layer giving the output of stack LSTM at each step.

ekayen commented 3 years ago

That seems like it's probably the culprit! Thank you! I will give this a try and see if it fixes the issue.

I have actually been concatenating my featires with self.emb, which caused the odd problems with the backward() pass -- I suspect that switching off embedding tying will fix this, and will update if so! Thanks again!

ekayen commented 3 years ago

That resolved the issue! Thank you again for the input!