Unbabel / OpenKiwi

Open-Source Machine Translation Quality Estimation in PyTorch
https://unbabel.github.io/OpenKiwi/
GNU Affero General Public License v3.0
229 stars 48 forks source link

Error in predicting when using GPU for predictor-estimator #60

Closed prabhakar267 closed 4 years ago

prabhakar267 commented 4 years ago

To Reproduce

I used the following config for predicting using the predictor-estimator model and I got some error when using GPU. I'm using it on a multi GPU machine but want to run only on one GPU.

output-dir: ...
seed: 42

gpu-id: 0
debug: True

model: estimator

load-model: ...

wmt18-format: False
test-source: ...
test-target: ...
valid-batch-size: 1024
2020-02-27 06:31:03.970 [kiwi.data.utils load_vocabularies_to_fields:126] Loaded vocabularies from KiwiCutter/trained_models/estimator_en_de.torch/estimator_en_de.torch
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/kiwi/bin/kiwi", line 11, in <module>
    sys.exit(main())
  File "/home/ubuntu/anaconda3/envs/kiwi/lib/python3.6/site-packages/kiwi/__main__.py", line 22, in main
    return kiwi.cli.main.cli()
  File "/home/ubuntu/anaconda3/envs/kiwi/lib/python3.6/site-packages/kiwi/cli/main.py", line 73, in cli
    predict.main(extra_args)
  File "/home/ubuntu/anaconda3/envs/kiwi/lib/python3.6/site-packages/kiwi/cli/pipelines/predict.py", line 56, in main
    predict.predict_from_options(options)
  File "/home/ubuntu/anaconda3/envs/kiwi/lib/python3.6/site-packages/kiwi/lib/predict.py", line 54, in predict_from_options
    run(options.model_api, output_dir, options.pipeline, options.model)
  File "/home/ubuntu/anaconda3/envs/kiwi/lib/python3.6/site-packages/kiwi/lib/predict.py", line 131, in run
    test_dataset, batch_size=pipeline_opts.batch_size
  File "/home/ubuntu/anaconda3/envs/kiwi/lib/python3.6/site-packages/kiwi/predictors/predictor.py", line 116, in run
    model_pred = self.model.predict(batch)
  File "/home/ubuntu/anaconda3/envs/kiwi/lib/python3.6/site-packages/kiwi/models/model.py", line 119, in predict
    model_out = self(batch)
  File "/home/ubuntu/anaconda3/envs/kiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/anaconda3/envs/kiwi/lib/python3.6/site-packages/kiwi/models/predictor_estimator.py", line 324, in forward
    model_out_tgt = self.predictor_tgt(batch)
  File "/home/ubuntu/anaconda3/envs/kiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/anaconda3/envs/kiwi/lib/python3.6/site-packages/kiwi/models/predictor.py", line 240, in forward
    source_mask = self.get_mask(batch, source_side)[:, 1:-1]
  File "/home/ubuntu/anaconda3/envs/kiwi/lib/python3.6/site-packages/kiwi/models/model.py", line 205, in get_mask
    input_tensor != pad_id, dtype=torch.uint8
RuntimeError: Expected object of device type cuda but got device type cpu for argument #3 'other' in call to _th_iand_

Environment (please complete the following information):

captainvera commented 4 years ago

Hey @prabhakar267,

Sorry you're experiencing some issues. Can you tell me which version of pytorch you are using? I cannot reproduce the issue with openkiwi==0.1.2 and torch > 1.1.

prabhakar267 commented 4 years ago

torch==1.4.0

captainvera commented 4 years ago

Sorry for the late response. Unfortunately I haven't had the opportunity to test this. However I believe this issue has been fixed on #44. Can you try to install openkiwi from master?

I will test this shortly and tag a new version if my suspicions are confirmed.

prabhakar267 commented 4 years ago

@captainvera I installed from master branch, the error message changed

Traceback (most recent call last):
  File "using_kiwi_gpu.py", line 50, in <module>
    'target': target_texts
  File "/home/ubuntu/anaconda3/envs/kiwi/lib/python3.6/site-packages/kiwi/predictors/predictor.py", line 106, in predict
    return self.run(dataset, batch_size)
  File "/home/ubuntu/anaconda3/envs/kiwi/lib/python3.6/site-packages/kiwi/predictors/predictor.py", line 116, in run
    model_pred = self.model.predict(batch)
  File "/home/ubuntu/anaconda3/envs/kiwi/lib/python3.6/site-packages/kiwi/models/model.py", line 119, in predict
    model_out = self(batch)
  File "/home/ubuntu/anaconda3/envs/kiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/anaconda3/envs/kiwi/lib/python3.6/site-packages/kiwi/models/predictor_estimator.py", line 324, in forward
    model_out_tgt = self.predictor_tgt(batch)
  File "/home/ubuntu/anaconda3/envs/kiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/anaconda3/envs/kiwi/lib/python3.6/site-packages/kiwi/models/predictor.py", line 243, in forward
    source_embeddings = self.embedding_source(source)
  File "/home/ubuntu/anaconda3/envs/kiwi/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/anaconda3/envs/kiwi/lib/python3.6/site-packages/torch/nn/modules/sparse.py", line 114, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/home/ubuntu/anaconda3/envs/kiwi/lib/python3.6/site-packages/torch/nn/functional.py", line 1467, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected object of backend CPU but got backend CUDA for argument #3 'index'
captainvera commented 4 years ago

Hey @prabhakar267,

What I did to try to reproduce your issue:

I installed openkiwi from master in a new virtual env. Downloaded the pretrained models available on the releases.

Used this config:

seed: 42
gpu-id: 0

debug: True
model: estimator
load-model: pre_trained/estimator/target_1/model.torch
wmt18-format: False

test-source: data/wmt19/en_de.nmt/test.src
test-target: data/wmt19/en_de.nmt/test.mt

valid-batch-size: 16
output-dir: tmp_out

I had no errors and was able to run the whole thing.

I then uninstalled openkiwi and pip installed it. Using the same config+model I was able to reproduce your original issue, confirming my suspicions. I'll tag a new version to fix the pip package.

On the other hand, your final error is kind of weird. The bug popped up in a completely different place. To help me reproduce it, can you tell me how you're calling kiwi? And can you try to use one of the pre-trained models to see if they work?

prabhakar267 commented 4 years ago

I'm getting the error using the predictor-estimator model.

model = kiwi.load_model('trained_models/estimator_en_de.torch/estimator_en_de.torch')
model._device = torch.device("cuda")

source_texts = ...
target_texts = ...
predictions = model.predict({
    'source': source_texts,
    'target': target_texts
})
print(predictions)
captainvera commented 4 years ago

Hey @prabhakar267 , sorry for the late response. I was away from work.

Thanks for sending the example script! I wasn't considering that use case and though you were using kiwi as a terminal tool, not as a python package.

The variable you are calling model (like we do on our examples) is actually from the Predicter class. Furthermore, our code wasn't ready for a change of the _device variable after initialization.

What you would have to do is:

model = kiwi.load_model('trained_models/estimator_en_de.torch/estimator_en_de.torch')
model.model.to("cuda")
model._device = torch.device("cuda")

Of course this makes to sense from a design perspective. There's a PR #61 that adds an interface similar to PyTorch. Once that's merged you can do:

model = kiwi.load_model('trained_models/estimator_en_de.torch/estimator_en_de.torch')
mode.to("cuda")

With this, all predictions will be made using the GPU.

Hope this solves your issue.

edit: closed issue by mistake

captainvera commented 4 years ago

Hey @prabhakar267,

Changes have been merged to master :) let me know if you have any other problem,

Cheers

kepler commented 4 years ago

@prabhakar267, let us know if this is not solved.