Unbabel / OpenKiwi

Open-Source Machine Translation Quality Estimation in PyTorch
https://unbabel.github.io/OpenKiwi/
GNU Affero General Public License v3.0
229 stars 48 forks source link

Replacing the predictor by BERT or XLM to implement PREDEST-BERT and PREDEST-XLM? #65

Closed AriaChen closed 4 years ago

AriaChen commented 4 years ago

Hello,

Thank you very much for this OpenKiwi toolkit.

I am trying to reproduce the section 2.6, Transfer Learning and Fine-Tuning part in the Unbabel’s paper

It states that I can replace the predictor with multilingual BERT or XLM. I'm wondering how can I achieve that?

If I simply load the PyTorch version of BERT or XLM model in the train_estimator.yaml file, it gives me KeyError of 'vocab'. The script is trying to retrieve the vocabulary torch file from the models but couldn't find it. What should I modify with the models or OpenKiwi source code to solve this problem?

ghost commented 4 years ago

Hi Aria,

I am also interested in trying this. Have you made any progress?

Best, Andras

captainvera commented 4 years ago

Hey Aria, Andras,

Sorry for my late response on this issue. When the original issue was opened we were hard at work for WMT20 and I missed it.

I have very good news on this matter. We are launching a new, revamped version of OpenKiwi later this week. We are putting finishing touches on documentation and code cleaning.

This version of OpenKiwi will support training with BERT/XLM and XLMR and makes it extremely simple to add further Huggingface transformers as QE Encoders.

Stay tuned :)

Miguel

ghost commented 4 years ago

Hi Miguel,

Thanks, that is great news indeed! I'm looking forward to the new version.

Best, Andras

captainvera commented 4 years ago

Hi Andras,

Wanted to keep you updated on this. The final wave of bug squishing is taking a bit longer than we had anticipated, so look for the release this week :)

Sorry for the delay!

ghost commented 4 years ago

Thanks, that sounds great!

captainvera commented 4 years ago

Version 2.0.0 has been released, including PREDEST-{BERT/XLM/XLMR}.

Closing this issue

ghost commented 4 years ago

Thank you!