Open kylebgorman opened 9 months ago
I think I started work on this a while back and then forgot about it.
Wanted to note here that a labmate was just experimenting with this kind of thing and found that shared embeddings (which AFAIK we no longer have an option for) outperform separate embeddings by quite a bit on his dataset when training data is really sparse (does not seem to matter when there is a lot of data).
I will try to reprioritize this in the next couple weeks.
I'm not surprised. They should really help when the source and target vocabularies overlap.
I went ahead and assigned you; hope that's okay.
If the user requests a non-shared embedding matrix, keep the source and target embeddings separate by enclosing the source symbols {like} {this} (cf. what we do with the features vocabulary [like] [this]).
How should this be requested? Shall I add a flag --shared_embeddings
or --tied_embeddings
?
When the vocabulary is created, continue to log the three vocabularies separately---this is useful debugging information. However, this can be moved into the vocabulary constructor itself.
I think you have already done this, right? Or should I call log_vocabularies
in DataModule.__init__
?
Store a single embedding matrix in the Model (not the Module); either delegate embedding lookup to the Model or pass a reference to the matrix when the modules are constructed. (There may be some tricks with initializing the parameters since some models are opinionated about this, but nothing that can't be solved here.)
I will go back through what I have done and make sure it is implemented this way.
If the user requests a non-shared embedding matrix, keep the source and target embeddings separate by enclosing the source symbols {like} {this} (cf. what we do with the features vocabulary [like] [this]).
How should this be requested? Shall I add a flag
--shared_embeddings
or--tied_embeddings
?
Yes sounds good. I guess this is traditionally called "tied embeddings", though "shared" is maybe clearer.
When the vocabulary is created, continue to log the three vocabularies separately---this is useful debugging information. However, this can be moved into the vocabulary constructor itself.
I think you have already done this, right? Or should I call
log_vocabularies
inDataModule.__init__
?
It does do that as is.
Store a single embedding matrix in the Model (not the Module); either delegate embedding lookup to the Model or pass a reference to the matrix when the modules are constructed. (There may be some tricks with initializing the parameters since some models are opinionated about this, but nothing that can't be solved here.)
I will go back through what I have done and make sure it is implemented this way.
SGTM. Maybe my proposal is overly ambitious, but I just didn't think it should be some kind of goofy hack like letting the encoder initialize the embeddings and let the decoder reach down into the encoder to get them.
We currently store separate vocabularies for source, target, and features, and have separate embeddings matrices for the encoder and the decoder. We propose to flatten this distinction the following way:
{like}
{this}
(cf. what we do with the features vocabulary[like]
[this]
).Model
(not theModule
); either delegate embedding lookup to theModel
or pass a reference to the matrix when the modules are constructed. (There may be some tricks with initializing the parameters since some models are opinionated about this, but nothing that can't be solved here.)