Load model from Opus-MT

EdinburghNLP / nematus

Open-Source Neural Machine Translation in Tensorflow

BSD 3-Clause "New" or "Revised" License

797 stars 269 forks source link

Load model from Opus-MT #112

Closed lhk closed 4 years ago

lhk commented 4 years ago

Is it possible to load one of the pretrained models from Opus-MT with nematus? Models are listed here: https://github.com/Helsinki-NLP/Opus-MT

As far as I understand, Opus-MT is based on Marian-NMT. And by default, models created with Marian-NMT seem compatible with nematus.

emjotde commented 4 years ago

Hi, marian-nmt maintainer here. That assumption hasn't been true for a while. Only a certain class of RNN-based used to compatible with old Theano-based Nematus. Not sure if that is still the case.

lhk commented 4 years ago

Hi, thanks for the quick feedback :)

The pretrained models seem to be in a very readable format. It's lots of .npy files.

Is there some documentation on the layout? Can I manually reproduce the corresponding model in tensorflow and read in the weights? I would love to use marian-nmt or opus-mt but for deployment, it has to be tensorflow.

emjotde commented 4 years ago

The Hugging face people are cooking something up right now. So maybe just wait? I am sure they will announce it.

rsennrich commented 4 years ago

If you can't wait, or just for those interested: if multiple toolkits implement the same, well-defined architecture (like the Transformer), it's possible in principle to map the parameters from one to the other. We also wrote such a conversion to port RNN models from our Theano to our Tensorflow codebase: https://github.com/EdinburghNLP/nematus/blob/master/nematus/theano_tf_convert.py

However, the devil is in the details. For example, implementations may have slight differences in the architecture (some well-known variants are pre-norm and post-norm Transformers, see https://arxiv.org/pdf/1906.01787.pdf ), and if the architecture differs, you will not be able to port models between toolkits by just copying the weights.

emjotde commented 4 years ago

If it was only architecture that would be easy :) E.g. between Marian and Fairseq one of the differences is a positional embedding starting point shifted by 1. Try to find that by yourself.

Then again it's an opportunity to learn all the things you never want to know about not one but two toolkits! :)

emjotde commented 4 years ago

Something potentially working. All questions to be directed to the Huggingface people :)

https://github.com/huggingface/transformers/blob/master/src/transformers/convert_marian_to_pytorch.py