Unbabel / OpenKiwi

Open-Source Machine Translation Quality Estimation in PyTorch
https://unbabel.github.io/OpenKiwi/
GNU Affero General Public License v3.0
229 stars 48 forks source link

The prediction process is not complete by Predictor Estimator. #101

Closed Balakiranred closed 3 years ago

Balakiranred commented 3 years ago

Issue: The pretrained Predictor Estimator is not able to generate all the 3 files (Source_tags, gap_tags and hter) Only one file (Source_tags) is generated.

Here is what I have done.

  1. From Git link I have downloaded the code and configured on my server.
  2. Completed the setup.
  3. Did try to run the predict.yaml from CLI.
  4. That was not working the model was not able to load from best_model.torch
  5. To have the pre trained models. From below link download the .zip file a. https://unbabel.github.io/OpenKiwi/reproduce.html <-- Using pre-trained models b. https://github.com/Unbabel/OpenKiwi/releases/ <-- Downloaded from 0.1.1 Assests 5 en_de.nmt_models.zip
  6. Completed the setup and configurations.
  7. Performed below CLI command. CUDA_VISIBLE_DEVICES=1 nohup kiwi predict --gpu-id 0 --model estimator --test-source /data01/bala/TS-5707_openkiwi/de/de-en-val.txt.en --test-target /data01/bala/TS-5707_openkiwi/de/de-en-val.txt.de --output-dir /data01/bala/TS-5707_openkiwi/OpenKiwi-master/config/data/ --load-model /data01/bala/TS-5707_openkiwi/OpenKiwi-master/en_de.nmt_models/estimator/source/model.torch &
  8. source_tags file got generated. But not the other two files. Here is the log. {'batch_size': 64, 'config': None, 'debug': False, 'experiment_name': None, 'gpu_id': 0, 'load_data': None, 'load_model': '/data01/bala/TS-5707_openkiwi/OpenKiwi-master/en_de.nmt_models/estimator/source/model.torch', 'load_vocab': None, 'log_interval': 100, 'mlflow_always_log_artifacts': False, 'mlflow_tracking_uri': 'mlruns/', 'model': 'estimator', 'output_dir': '/data01/bala/TS-5707_openkiwi/OpenKiwi-master/config/data/', 'quiet': False, 'run_name': None, 'run_uuid': None, 'save_config': None, 'save_data': None, 'seed': 42} Local output directory is: /data01/bala/TS-5707_openkiwi/OpenKiwi-master/config/data/ Predict with the PredEst (Predictor-Estimator) model Loaded vocabularies from /data01/bala/TS-5707_openkiwi/OpenKiwi-master/en_de.nmt_models/estimator/source/model.torch Saving source_tags predictions to /data01/bala/TS-5707_openkiwi/OpenKiwi-master/config/data/source_tags

Note: I did expect some error or info about the other two files. And also it's better if it show some info like 'The Prediction is Complete"

Could you please review and advice me what need to be corrected, to have all the 3 files (source_tags,gap_tags and hter) so that I can proceed with Evaluation process. Appreciate your help.

Thank you very much! Bala.

Balakiranred commented 3 years ago

Can some one look in to it and suggest me. Thanks Bala.

captainvera commented 3 years ago

Hey @Balakiranred,

I am trying to reproduce your issue on my end. Honestly it's been a long time since I myself have used Openkiwi versions <2.0 and nothing jumps to mind as the reason for this to not work.

While I try to reproduce this issue, could you expand on your point 3.? Why couldn't you use a yaml to make the prediction?

captainvera commented 3 years ago

Ah, I have found exactly the source of the problem.

You see, for our WMT19 submission we trained specific models for source tags and for target tags in order to maximize Pearson on the two separate tasks. The source model will only produce source tags and the target model will produce sentence-level scores + target tags.

Your issue here is that you are using this source model. If you change your command to:

CUDA_VISIBLE_DEVICES=1 nohup kiwi predict --gpu-id 0 --model estimator --test-source /data01/bala/TS-5707_openkiwi/de/de-en-val.txt.en --test-target /data01/bala/TS-5707_openkiwi/de/de-en-val.txt.de --output-dir /data01/bala/TS-5707_openkiwi/OpenKiwi-master/config/data/ --load-model /data01/bala/TS-5707_openkiwi/OpenKiwi-master/en_de.nmt_models/estimator/target_1/model.torch &

You should see the target side files generated correctly.

This is not a bug with Openkiwi, it is working as expected!

I will leave this issue open in case you have any follow-up questions.

Balakiranred commented 3 years ago

Hi Captainvera,

Thank you very much for your reply. The solution worked out for me and PE is able to generate all the tag, gap and sentence scores files.

Regards Bala.

captainvera commented 3 years ago

I'm happy to know your issue is solved!