Open Adamits opened 1 year ago
Sure, I have been wondering that too. Is there any way to absolutely forbid them from being copied into the target sequence though? (And wouldn't that give rise to a decoding error if they were bceause their indices would not be in the relevant index?)
Here's my hypothesis though (and maybe this is just my "neat"ness coming out in this admittedly "scruffy" project): concatenation is never substantially better than separate encoders, and it's occasionally worse. If I am right about this, then we could free ourselves of all that code specific to concatenating models, eventually.
This is a good point. If we implement the option to concat for every model, then we could run this experiment.
"To conatenate or not to concatenate": great SIGMORPHON short paper idea.
I was just thinking, though our pointer-generator implementation(s) take care to encode features separately so that they are not used in the attention distribution for the pointer probabilities, I think it is worth making it easy to just consider features as other input symbols along with the lemmas.
That is, to concatenate the features with the input just like we do for the 'vanilla' seq2seq models. This is just for comparison as sometimes these models learn things on their own without much intervention.