facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.38k stars 6.4k forks source link

Some bugs about model's architecture #3761

Closed trestad closed 3 years ago

trestad commented 3 years ago

🐛 Bug

To Reproduce

Steps to reproduce the behavior (always include the command you ran):

I find that when I use cmd "git clone https://github.com/pytorch/fairseq" to install fairseq, there are some bugs. For example, in
“transformer_iwslt_de_en”, the parameter "--encoder/decoder-ffn-embed-dim" should be 1024, but even though I have used "-arch transformer_iwslt_de_en" in the train command, I still got a model with architecture of "base_architecture" where "--encoder-ffn-embed-dim" is 2048. When I turn to the stable release, this bug never shows. I hope you can fix this.

  1. See error

image

image

freewym commented 3 years ago

I have the same problem with the master branch: command line arguments for the Transformer's architecture seem not to override the default values. I think this bug was introduced in the commit https://github.com/pytorch/fairseq/commit/129d8594ccdc6644be84dc249e16489e049f4bfd