Great perplexity on training but useless translations

OpenNMT / OpenNMT-py

Open Source Neural Machine Translation and (Large) Language Models in PyTorch

https://opennmt.net/

MIT License

6.75k stars 2.25k forks source link

Great perplexity on training but useless translations #19

Closed mattiadg closed 7 years ago

mattiadg commented 7 years ago

Hi,

I'm running the last updated OpenNMT-py with the latest pytorch, cuda7.5 on a GPU K80. I ran the training with both the data provided in the example and with the en-fr from IWSLT2016. In both cases, the perplexity during training gets very low (single digit for every minibatch) but the accuracy is always 0.0. Moreover, when I try to translate the validation sets, the translations seems totally random.

guillaumekln commented 7 years ago

Hi,

Are you using Python 2.7? There were some issues with divisions. Could you retry with the latest version?

mattiadg commented 7 years ago

Now I'm getting positive accuracy values for accuracy during training, but still I couldn't perform a decoding. I'll give you more info soon.

bmccann commented 7 years ago

This might help: https://github.com/OpenNMT/OpenNMT-py/pull/22, but it will only impact unk replacement. If you weren't getting anywhere near good translations, we might need more info since I'm using IWSLT de-en and getting good results.

mattiadg commented 7 years ago

Without #22 I get the following error. I'm using python2.7 and the flag -replace_unk

Traceback (most recent call last):
  File "/hltsrv0/digangi/OpenNMT-py/translate.py", line 135, in <module>
    main()
  File "/hltsrv0/digangi/OpenNMT-py/translate.py", line 89, in main
    predBatch, predScore, goldScore = translator.translate(srcBatch, tgtBatch)
  File "/hltsrv0/digangi/OpenNMT-py/onmt/Translator.py", line 204, in translate
    for n in range(self.opt.n_best)]
  File "/hltsrv0/digangi/OpenNMT-py/onmt/Translator.py", line 62, in buildTargetTokens
    tokens[i] = src[maxIndex[0]]
IndexError: list index out of range

While it can translate without -replace_unk

bmccann commented 7 years ago

Yeah, that's what I was seeing too. Translate with -replace_unk would only have run on datasets without unks or when using batch_size 1. Should be fixed now.

Regarding your poor decoding performance, what validation accuracy/perplexity are you getting down to? For the demo data, I don't think you'll ever get good translations. That was part of the motivation for adding Multi30k. I think eventually, we'll just remove the demo data. Multi30k and IWSLT should work pretty well though.

mattiadg commented 7 years ago

The poor decoding I was talking about was before the fix, when it was totally random. Now I get reasonable translations, but I trained only on the TED talks, so the bleu score isn't so high. Today I try to decide with your last fix, and if it works I don't have anything to add to this thread

mattiadg commented 7 years ago

Ok, I've downloaded the last version and now I don't get anymore that error. What I get is an out-of-memory error (I'm using a K80):

THCudaCheck FAIL file=/data/users/soumith/builder/wheel/pytorch-src/torch/lib/THC/generic/THCStorage.c line=79 error=2 : out of memory

And this is the command

python ~/OpenNMT-py/translate.py -gpu 0 -model models/model_*_e20.pt -src data/dev.en.lc.txt -tgt data/dev.fr.lc.txt -verbose -output demo_pred.txt -beam_size 5 -batch_size 20 -replace_unk

While without -replace_unk it is able to translate with beam_size 10 and batch_size 40

Maybe is it better not to use Python2.7?

bmccann commented 7 years ago

That seems odd. replace_unk is only relevant here, which doesn't seem like it should take up much memory. I looked around and realized the Variables for data could be made volatile though, so that should help some (6b4cb9d60eb662a736b09c69457375939cff5dc6). Let me know if you still see issues.