Unbabel / OpenKiwi

Open-Source Machine Translation Quality Estimation in PyTorch
https://unbabel.github.io/OpenKiwi/
GNU Affero General Public License v3.0
229 stars 48 forks source link

Failed to conduct Predictor-Estimator predicting #22

Closed lihongzheng-nlp closed 5 years ago

lihongzheng-nlp commented 5 years ago

After training zh-en data with predictor model, I continued the predict step with following command: kiwi predict --model estimator --test-source /home/hzli/work/MTQE/CWMT_2018/zh-en/zh-en/dev/dev.source --test-target /home/hzli/work/MTQE/CWMT_2018/zh-en/zh-en/dev/dev.target --sentence-level True --gpu-id 0 --output-dir /home/hzli/work/MTQE/CWMT_2018/zh-en/zh-en/ I got following errors: [kiwi.lib.predict setup:159] {'batch_size': 64, 'config': None, 'debug': False, 'experiment_name': None, 'gpu_id': 0, 'load_data': None, 'load_model': None, 'load_vocab': None, 'log_interval': 100, 'mlflow_always_log_artifacts': False, 'mlflow_tracking_uri': 'mlruns/', 'model': 'estimator', 'output_dir': '/home/hzli/work/MTQE/CWMT_2018/zh-en/zh-en/', 'quiet': False, 'run_uuid': None, 'save_config': None, 'save_data': None, 'seed': 42}

Traceback (most recent call last): File "/home/hzli/anaconda3/bin/kiwi", line 11, in sys.exit(main()) File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/main.py", line 22, in main return kiwi.cli.main.cli() File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/cli/main.py", line 73, in cli predict.main(extra_args) File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/cli/pipelines/predict.py", line 56, in main predict.predict_from_options(options) File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/lib/predict.py", line 54, in predict_from_options run(options.model_api, output_dir, options.pipeline, options.model) File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/lib/predict.py", line 113, in run model = Model.create_from_file(pipeline_opts.load_model) File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/models/model.py", line 210, in create_from_file str(path), map_location=lambda storage, loc: storage File "/home/hzli/anaconda3/lib/python3.6/site-packages/torch/serialization.py", line 356, in load f = open(f, 'rb') FileNotFoundError: [Errno 2] No such file or directory: 'None' I spent lots time to find out the error, but it never work. Would you please give me some advice for solving this error? Thank you very much!

captainvera commented 5 years ago

Hey @VictorLi2017, The way the predict pipeline works is by loading a pre-trained model and creating predictions for data where you don't have tags. Here, you forgot the step of loading the pre-trained model. You need to pass a --load-model [Path to model] flag to the predict pipeline.

I also realised that this isn't addressed in the documentation and will update it 👍

Note: As a friendly reminder, to use a predictor-estimator you need to first pre-train the predictor on a large parallel corpora and then the estimator on QE data (with tags).

I'm closing the issue, feel free to re-open it if the problem persists!

lihongzheng-nlp commented 5 years ago

Hey @VictorLi2017, The way the predict pipeline works is by loading a pre-trained model and creating predictions for data where you don't have tags. Here, you forgot the step of loading the pre-trained model. You need to pass a --load-model [Path to model] flag to the predict pipeline.

I also realised that this isn't addressed in the documentation and will update it

Note: As a friendly reminder, to use a predictor-estimator you need to first pre-train the predictor on a large parallel corpora and then the estimator on QE data (with tags).

I'm closing the issue, feel free to re-open it if the problem persists!

Hello @captainvera , following your guide, I added --load-model to above full command, kiwi predict --model estimator --test-source /home/hzli/work/MTQE/CWMT_2018/zh-en/zh-en/dev/dev.source --test-target /home/hzli/work/MTQE/CWMT_2018/zh-en/zh-en/dev/dev.target --sentence-level True --gpu-id 0 --output-dir /home/hzli/work/MTQE/CWMT_2018/zh-en/zh-en/ **--load-model /home/hzli/work/MTQE/CWMT_2018/zh-en/zh-en/runs/0/de596b315f7a4428bd881224376158bc/best_model.torch** After a while, but there are no predictation results instead of only a output.log file under the output dir, as attached. I guess there should be some predictation results, right? output.log

Would you please check it for me, and give me further instructions? Thank you very much!

trenous commented 5 years ago

Hey @VictorLi2017 ,

I believe the issue is that the model you are loading is a Predictor, not an Estimator. Is that possible? If so, train an Estimator with your pretrained Predictor (see this section in the docs, example config) and then run the prediction pipeline again.

Does this solve your issue?

A bit more detail about what happened: The Predictor model itself does not do quality estimation, it is a conditional language model predicting words in the target given the source. Now, when calling the Model.predict method, only QE predictions are generated which explains why you did not see any outputs.

And thanks for reporting these problems, you are pointing out some important flaws in our handling of flags and incorrect inputs. This should have generated an informative error message. Improving the parameter parsing and validation is one of our main priorities moving forward. Best, Sony

lihongzheng-nlp commented 5 years ago

Hey @VictorLi2017 ,

I believe the issue is that the model you are loading is a Predictor, not an Estimator. Is that possible? If so, train an Estimator with your pretrained Predictor (see this section in the docs, example config) and then run the prediction pipeline again.

Does this solve your issue?

A bit more detail about what happened: The Predictor model itself does not do quality estimation, it is a conditional language model predicting words in the target given the source. Now, when calling the Model.predict method, only QE predictions are generated which explains why you did not see any outputs.

And thanks for reporting these problems, you are pointing out some important flaws in our handling of flags and incorrect inputs. This should have generated an informative error message. Improving the parameter parsing and validation is one of our main priorities moving forward. Best, Sony

Hello @trenous , Yes, I think what I have trained is with Predictor. If not mistaken, the QE pipeline includes three main stages: Training, Predicting and Evaluation, right? I want to try the Predictor-Estimator model with official Chinese-English data. In the training step, I used kiwi train --model predictor and corresponding parameters, after 50 epoches, I got the best_model.torch in the output dir. Then the Predicting step, I ran kiwi predict --model estimator --load-model best_model.torch, and got above problems: only a output.log, but with any no prediction results at all. I'm not quite sure that I used the correct model name in the two step?

By the way, I checked the training output.log, records in most epoches are as follow: target_PERP: nan, target_CORRECT: 0.0000, target_ExpErr: nan target_PERP: nan, target_CORRECT: 0.0000, target_ExpErr: nan EVAL_target_PERP: nan, EVAL_target_CORRECT: 0.0415, EVAL_target_ExpErr: nan I guess there must be some problems with the data. Right? I'll retry the WMT18 data with the whole pipeline once again, and will update you soon later. Thank you!

trenous commented 5 years ago

Hey, The predictor-estimator model relies on pretraining of its component model predictor. This is what you did with the command kiwi train --model predictor. The resulting best_model.torch is not a QE model, but can be used to initialize an estimator model like so:

kiwi train --model estimator --load-pred-target best_model.torch

The pretraining step allows you to make use of any parallel corpus in your target language. This can make a significant difference as public QE corpora are usually of a very small size.

Indeed it seems something went wrong with your training, would you mind sharing the config file and data you used?

lihongzheng-nlp commented 5 years ago

Hey, The predictor-estimator model relies on pretraining of its component model predictor. This is what you did with the command kiwi train --model predictor. The resulting best_model.torch is not a QE model, but can be used to initialize an estimator model like so:

kiwi train --model estimator --load-pred-target best_model.torch

The pretraining step allows you to make use of any parallel corpus in your target language. This can make a significant difference as public QE corpora are usually of a very small size.

Indeed it seems something went wrong with your training, would you mind sharing the config file and data you used?

Hello @trenous , I trained sentence-level QE with predictor-estimator, following your last guide, I ran
kiwi train --config experiments/train_predictor.yaml successfully, and got the best_model.torch, then I ran kiwi train --config experiments/train_estimator.yaml, but failed once again. Here is the errors: Traceback (most recent call last): File "/home/hzli/anaconda3/bin/kiwi", line 11, in sys.exit(main()) File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/main.py", line 22, in main return kiwi.cli.main.cli() File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/cli/main.py", line 71, in cli train.main(extra_args) File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/cli/pipelines/train.py", line 141, in main train.train_from_options(options) File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/lib/train.py", line 123, in train_from_options trainer = run(ModelClass, output_dir, pipeline_options, model_options) File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/lib/train.py", line 204, in run trainer.run(train_iter, valid_iter, epochs=pipeline_options.epochs) File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/trainers/trainer.py", line 75, in run self.train_epoch(train_iterator, valid_iterator) File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/trainers/trainer.py", line 95, in train_epoch outputs = self.train_step(batch) File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/trainers/trainer.py", line 139, in train_step model_out = self.model(batch) File "/home/hzli/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 477, in call result = self.forward(*input, **kwargs) File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/models/predictor_estimator.py", line 349, in forward sentence_input = self.make_sentence_input(h_tgt, h_src) File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/models/predictor_estimator.py", line 418, in make_sentence_input h = h_tgt[0] if h_tgt else h_src[0] TypeError: 'NoneType' object is not subscriptable

Attached is the train_estimator.yaml config file for your reference. Quite strangely, with the exactly same config file, my colleague ran it successfully on his machine. The data is official data used for QE by China Workshop of Machine Translation (CWMT). So I think the data should be good. Thank you!

train_estimator_yaml.txt

lihongzheng-nlp commented 5 years ago

@trenous PS: the train/dev data files include 4 files respectively: train.source, train.target, train.pe and train.hter, similar name formats like those in WMT sentence-level data.

captainvera commented 5 years ago

Hello @VictorLi2017 it is indeed extremely weird that your colleague can run it successfully on his machine. From the error messages it seems there was an error with data loading. As a first step I would make sure the path to your data is correct and that there is no typo.

This issue is hard to diagnose based on the error message since the only information we're getting is that there was an error in data loading. As @trenous mentioned earlier, our handling of flags and inputs is not the safest. As such, it is hard to conclude the exact problem solely from the error message.

If you are sure there is no issue in your path to the files, would you mind running with the --debug flag and posting the output log here (or the console output with timestamps if possible)?

lihongzheng-nlp commented 5 years ago

Hello @captainvera I'm sure that the path to the data is correct, I've already finished train_predictor step once again, but train_estimator step alway has the same error that TypeError: 'NoneType' object is not subscriptable
Attached is the train_estimator.log trained with --debug, please check it. Thank you! train_estimator.log

trenous commented 5 years ago

@VictorLi2017 Can you run git pull and let us know if the error persists? We fixed a bug related to training sentence-level only models recently.

lihongzheng-nlp commented 5 years ago

Hello @trenous the repo I used yesterday is already the latest version. I tried zh-en, en-zh pairs, even the official sentence-level data of WMT18, all resulted in the same error TypeError: 'NoneType' object is not subscriptable in train_estimator step.

trenous commented 5 years ago

Hello VictorLi,

Sorry for the long response time our team was working on a deadline.

The line numbers in your log file don't match the current version, e.g.:

File "/home/hzli/anaconda3/lib/python3.6/site-packages/kiwi/models/predictor_estimator.py", line 349, in forward:
    sentence_input = self.make_sentence_input(h_tgt, h_src)

If you look at the changes introduced in this commit - which addresses the bug you encountered - you'll see that that line was No 349 beforehand, and 357 afterwards.

Can you just do a fresh checkout of the repo, that should solve your problem.

trenous commented 5 years ago

I am closing this as it seems to be solved.