CUNY-CL / yoyodyne

Small-vocabulary sequence-to-sequence generation with optional feature conditioning
Apache License 2.0
25 stars 15 forks source link

Shared embeddings #142

Open kylebgorman opened 9 months ago

kylebgorman commented 9 months ago

We currently store separate vocabularies for source, target, and features, and have separate embeddings matrices for the encoder and the decoder. We propose to flatten this distinction the following way:

Adamits commented 3 months ago

I think I started work on this a while back and then forgot about it.

Wanted to note here that a labmate was just experimenting with this kind of thing and found that shared embeddings (which AFAIK we no longer have an option for) outperform separate embeddings by quite a bit on his dataset when training data is really sparse (does not seem to matter when there is a lot of data).

I will try to reprioritize this in the next couple weeks.

kylebgorman commented 3 months ago

I'm not surprised. They should really help when the source and target vocabularies overlap.

I went ahead and assigned you; hope that's okay.

Adamits commented 2 months ago

If the user requests a non-shared embedding matrix, keep the source and target embeddings separate by enclosing the source symbols {like} {this} (cf. what we do with the features vocabulary [like] [this]).

How should this be requested? Shall I add a flag --shared_embeddings or --tied_embeddings?

When the vocabulary is created, continue to log the three vocabularies separately---this is useful debugging information. However, this can be moved into the vocabulary constructor itself.

I think you have already done this, right? Or should I call log_vocabularies in DataModule.__init__?

Store a single embedding matrix in the Model (not the Module); either delegate embedding lookup to the Model or pass a reference to the matrix when the modules are constructed. (There may be some tricks with initializing the parameters since some models are opinionated about this, but nothing that can't be solved here.)

I will go back through what I have done and make sure it is implemented this way.

kylebgorman commented 2 months ago

If the user requests a non-shared embedding matrix, keep the source and target embeddings separate by enclosing the source symbols {like} {this} (cf. what we do with the features vocabulary [like] [this]).

How should this be requested? Shall I add a flag --shared_embeddings or --tied_embeddings?

Yes sounds good. I guess this is traditionally called "tied embeddings", though "shared" is maybe clearer.

When the vocabulary is created, continue to log the three vocabularies separately---this is useful debugging information. However, this can be moved into the vocabulary constructor itself.

I think you have already done this, right? Or should I call log_vocabularies in DataModule.__init__?

It does do that as is.

Store a single embedding matrix in the Model (not the Module); either delegate embedding lookup to the Model or pass a reference to the matrix when the modules are constructed. (There may be some tricks with initializing the parameters since some models are opinionated about this, but nothing that can't be solved here.)

I will go back through what I have done and make sure it is implemented this way.

SGTM. Maybe my proposal is overly ambitious, but I just didn't think it should be some kind of goofy hack like letting the encoder initialize the embeddings and let the decoder reach down into the encoder to get them.