Unbabel / OpenKiwi

Open-Source Machine Translation Quality Estimation in PyTorch
https://unbabel.github.io/OpenKiwi/
GNU Affero General Public License v3.0
229 stars 48 forks source link

Error training nuqe model #10

Closed erickrf closed 5 years ago

erickrf commented 5 years ago

When running kiwi train --config experiments/train_nuqe.yaml I ran into the following error:

2019-02-26 15:48:32.631 [root setup:380] This is run ID: 9124ced8667849acb40f10a124109234
2019-02-26 15:48:32.631 [root setup:383] Inside experiment ID: 0 (None)
2019-02-26 15:48:32.631 [root setup:386] Local output directory is: models/nuqe
2019-02-26 15:48:32.632 [root setup:389] Logging execution to MLflow at: mlruns/
2019-02-26 15:48:32.632 [root setup:397] Using CPU
2019-02-26 15:48:32.632 [root setup:400] Artifacts location: mlruns/0/9124ced8667849acb40f10a124109234/artifacts
2019-02-26 15:48:33.648 [kiwi.lib.train run:154] Training the NuQE model
2019-02-26 15:48:34.448 [kiwi.lib.train run:187] NuQE(
  (_loss): CrossEntropyLoss()
  (source_emb): Embedding(5372, 50, padding_idx=1)
  (target_emb): Embedding(7874, 50, padding_idx=1)
  (embeddings_dropout): Dropout(p=0.5)
  (linear_1): Linear(in_features=300, out_features=400, bias=True)
  (linear_2): Linear(in_features=400, out_features=400, bias=True)
  (linear_3): Linear(in_features=400, out_features=200, bias=True)
  (linear_4): Linear(in_features=200, out_features=200, bias=True)
  (linear_5): Linear(in_features=400, out_features=100, bias=True)
  (linear_6): Linear(in_features=100, out_features=50, bias=True)
  (linear_out): Linear(in_features=50, out_features=2, bias=True)
  (gru_1): GRU(400, 200, batch_first=True, bidirectional=True)
  (gru_2): GRU(200, 200, batch_first=True, bidirectional=True)
  (dropout_in): Dropout(p=0.0)
  (dropout_out): Dropout(p=0.0)
)
2019-02-26 15:48:34.449 [kiwi.lib.train run:188] 2313552 parameters
2019-02-26 15:48:34.449 [kiwi.trainers.trainer run:74] Epoch 1 of 10
Batches:   0%|                                    | 0/236 [00:00<?, ? batches/s]
Traceback (most recent call last):
  File "/Users/erick/.virtualenvs/kiwi/bin/kiwi", line 10, in <module>
    sys.exit(main())
  File "/Users/erick/.virtualenvs/kiwi/lib/python3.7/site-packages/kiwi/__main__.py", line 22, in main
    return kiwi.cli.main.cli()
  File "/Users/erick/.virtualenvs/kiwi/lib/python3.7/site-packages/kiwi/cli/main.py", line 71, in cli
    train.main(extra_args)
  File "/Users/erick/.virtualenvs/kiwi/lib/python3.7/site-packages/kiwi/cli/pipelines/train.py", line 141, in main
    train.train_from_options(options)
  File "/Users/erick/.virtualenvs/kiwi/lib/python3.7/site-packages/kiwi/lib/train.py", line 123, in train_from_options
    trainer = run(ModelClass, output_dir, pipeline_options, model_options)
  File "/Users/erick/.virtualenvs/kiwi/lib/python3.7/site-packages/kiwi/lib/train.py", line 204, in run
    trainer.run(train_iter, valid_iter, epochs=pipeline_options.epochs)
  File "/Users/erick/.virtualenvs/kiwi/lib/python3.7/site-packages/kiwi/trainers/trainer.py", line 75, in run
    self.train_epoch(train_iterator, valid_iterator)
  File "/Users/erick/.virtualenvs/kiwi/lib/python3.7/site-packages/kiwi/trainers/trainer.py", line 95, in train_epoch
    outputs = self.train_step(batch)
  File "/Users/erick/.virtualenvs/kiwi/lib/python3.7/site-packages/kiwi/trainers/trainer.py", line 140, in train_step
    loss_dict = self.model.loss(model_out, batch)
  File "/Users/erick/.virtualenvs/kiwi/lib/python3.7/site-packages/kiwi/models/quetch.py", line 161, in loss
    loss = self._loss(predicted, y)
  File "/Users/erick/.virtualenvs/kiwi/lib/python3.7/site-packages/torch/nn/modules/module.py", line 489, in __call__
    result = self.forward(*input, **kwargs)
  File "/Users/erick/.virtualenvs/kiwi/lib/python3.7/site-packages/torch/nn/modules/loss.py", line 904, in forward
    ignore_index=self.ignore_index, reduction=self.reduction)
  File "/Users/erick/.virtualenvs/kiwi/lib/python3.7/site-packages/torch/nn/functional.py", line 1970, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
  File "/Users/erick/.virtualenvs/kiwi/lib/python3.7/site-packages/torch/nn/functional.py", line 1788, in nll_loss
    .format(input.size(0), target.size(0)))
ValueError: Expected input batch_size (256) to match target batch_size (576).

The .yaml file only differed from the original in the path to the training data.

This happened with

mtreviso commented 5 years ago

Are you working with WMT19 data? If so, remember to add wmt18-format: true in your .yaml file if you are working with WMT18+ format (in this format it was added GAP tags, which are interleaved with MT tags in .tags).

erickrf commented 5 years ago

Thanks, that fixed it.