Open jopetty opened 3 years ago
aedca6f adds a "working" (i.e., doesn't error) BERT model, but it doesn't seem to learn very well. Among the design considerations:
It seems that the positional encodings built into the HuggingFace BERT models are not useful in a sequence to sequence context. I'm not really sure why this is, but it is fixable if we add our own positional encodings to the embedding layer of the pretrained models.
Would be nice to have BERT an option for the encoder. Some issues are:
Field
s we've been using.vocabulary
target
vocabulary, since BERT's tokenizer might do weird things to it?