Unbabel / OpenKiwi

Open-Source Machine Translation Quality Estimation in PyTorch
https://unbabel.github.io/OpenKiwi/
GNU Affero General Public License v3.0
229 stars 48 forks source link

predictor-estimator crashes with Russian data #32

Closed ninalopatina closed 5 years ago

ninalopatina commented 5 years ago

Describe the bug Estimator training crashes during training with WMT19 Russian data

To Reproduce Steps to reproduce the behavior:

  1. Switch data to WMT2019 Russian data
  2. train predictor
  3. train estimator
  4. See error @ 22% of batches in first epoch, 53/236

Expected behavior I expected the estimator to train the same way it had for the German datasets

Screenshots 2019-06-24 21:07:25.075 [kiwi.trainers.trainer run:74] Epoch 1 of 10 Batches: 22%|██████ | 53/236 [00:27<00:58, 3.11 batches/s]Traceback (most recent call last): File "/home/nlopatina/.virtualenvs/OpenKiwi/bin/kiwi", line 11, in load_entry_point('openkiwi', 'console_scripts', 'kiwi')() File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/main.py", line 22, in main return kiwi.cli.main.cli() File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/cli/main.py", line 71, in cli train.main(extra_args) File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/cli/pipelines/train.py", line 141, in main train.train_from_options(options) File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/lib/train.py", line 123, in train_from_options trainer = run(ModelClass, output_dir, pipeline_options, model_options) File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/lib/train.py", line 204, in run trainer.run(train_iter, valid_iter, epochs=pipeline_options.epochs) File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/trainers/trainer.py", line 75, in run self.train_epoch(train_iterator, valid_iterator) File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/trainers/trainer.py", line 95, in train_epoch outputs = self.train_step(batch) File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/trainers/trainer.py", line 139, in train_step model_out = self.model(batch) File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, kwargs) File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/models/predictor_estimator.py", line 324, in forward model_out_tgt = self.predictor_tgt(batch) File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, *kwargs) File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/models/predictor.py", line 275, in forward for i in range(target_len - 2) File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/models/predictor.py", line 275, in for i in range(target_len - 2) File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(input, kwargs) File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/models/modules/attention.py", line 36, in forward scores = self.scorer(query, keys) File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, kwargs) File "/mnt/fs03/home/nlopatina/OpenKiwi/kiwi/models/modules/scorer.py", line 60, in forward layer_in = layer(layer_in) File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, *kwargs) File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/container.py", line 92, in forward input = module(input) File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(input, kwargs) File "/home/nlopatina/.virtualenvs/OpenKiwi/lib/python3.6/site-packages/torch/nn/modules/activation.py", line 292, in forward return torch.tanh(input) RuntimeError: CUDA out of memory. Tried to allocate 75.62 MiB (GPU 1; 11.93 GiB total capacity; 10.68 GiB already allocated; 42.56 MiB free; 717.88 MiB cached)

Environment (please complete the following information): OS: Linux OpenKiwi version 0.1.1 Python version 3.6.5

Additional context

ninalopatina commented 5 years ago

Nevermind, fixed this by adding a few specifications to the yaml