Closed flauted closed 1 year ago
@pltrdy can you please have a look ?
@flauted do you get normal scores without coverage penalty?
The problem is probably not about beta tho.
@pltrdy Yeah I do.
python translate.py -model /home/dylan/Downloads/averaged-10-epoch.pt -src /home/dylan/Downloads/test_short.en -tgt /home/dylan/Downloads/test_short.de -verbose -replace_unk -output pred.txt -gpu 0 -report_time
PRED AVG SCORE: -0.4390, PRED PPL: 1.5512
GOLD AVG SCORE: -11.2287, GOLD PPL: 75260.5316
Total translation time (s): 60.323037
Average translation time (s): 0.120646
Tokens per second: 200.172282
Could you try with the parameters of http://opennmt.net/OpenNMT-py/Summarization.html
In particular using -coverage_penalty summary
Summary works.
python translate.py -model /home/dylan/Downloads/averaged-10-epoch.pt -src /home/dylan/Downloads/test_short.en -tgt /home/dylan/Downloads/test_short.de -verbose -replace_unk -output pred.txt -gpu 0 -report_time -stepwise_penalty -min_length 35 -coverage_penalty summary -beta 5 -length_penalty wu -alpha 0.9
PRED AVG SCORE: -0.7263, PRED PPL: 2.0674
GOLD AVG SCORE: -11.2287, GOLD PPL: 75260.5316
Total translation time (s): 115.586021
Average translation time (s): 0.231172
Tokens per second: 161.939998
Would be interesting to check predictions scores, in particular to look how much sentences get inf
scores
might be a beam size issue, try with 15.
python translate.py -model /home/dylan/Downloads/averaged-10-epoch.pt -src /home/dylan/Downloads/test_short.en -tgt /home/dylan/Downloads/test_short.de -verbose -replace_unk -output pred.txt -gpu 0 -report_time -stepwise_penalty -min_length 35 -coverage_penalty wu -beta 0.2 -length_penalty wu -alpha 0.9 -beam_size 15
/opt/conda/conda-bld/pytorch_1544202130060/work/aten/src/THC/THCTensorIndex.cu:308: void indexSelectSmallIndex(TensorInfo<T, IndexType>, TensorInfo<T, IndexType>, TensorInfo<long, IndexType>, int, int, IndexType, long) [with T = float, IndexType = unsigned int, DstDim = 2, SrcDim = 2, IdxDim = -2]: block: [0,0,0], thread: [32,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
<...>
Traceback (most recent call last):
File "/home/dylan/code/OpenNMT-py/./translate.py", line 42, in <module>
main(opt)
File "/home/dylan/code/OpenNMT-py/./translate.py", line 27, in main
attn_debug=opt.attn_debug
File "/home/dylan/code/OpenNMT-py/onmt/translate/translator.py", line 225, in translate
batch, data.src_vocabs, attn_debug, fast=self.fast
File "/home/dylan/code/OpenNMT-py/onmt/translate/translator.py", line 465, in translate_batch
return self._translate_batch(batch, src_vocabs)
File "/home/dylan/code/OpenNMT-py/onmt/translate/translator.py", line 727, in _translate_batch
beam_attn.data[j, :, :memory_lengths[j]])
File "/home/dylan/code/OpenNMT-py/onmt/translate/beam.py", line 140, in advance
self.global_scorer.update_global_state(self)
File "/home/dylan/code/OpenNMT-py/onmt/translate/beam.py", line 238, in update_global_state
beam.global_state['coverage']).sum(1)
RuntimeError: CUDA error: device-side assert triggered
(That's with CUDA_LAUNCH_BLOCKING=1)
CPU is very slow, but I got an update before I got an error, so it seems to be working.
That seems to be caused by -stepwise_penalty
. It's the same error with a few different -beam_size
s.
Dropping -stepwise_penalty and trying a beam size of 15:
python translate.py -model /home/dylan/Downloads/averaged-10-epoch.pt -src /home/dylan/Downloads/test_short.en -tgt /home/dylan/Downloads/test_short.de -verbose -replace_unk -output pred.txt -report_time -coverage_penalty wu -beta 0.2 -length_penalty wu -alpha 0.9 -gpu 0 -beam_size 15
PRED AVG SCORE: -inf, PRED PPL: inf
GOLD AVG SCORE: -11.2287, GOLD PPL: 75260.5316
Total translation time (s): 137.860756
Average translation time (s): 0.275722
Tokens per second: 82.887983
@pltrdy It looks like they're all -inf.
it's annoying. Do you have time to download one of the summarization pretrained models ? (an LSTM one)
Sorry for the late response, this fell off my plate.
I downloaded Gigaword from the Drive link here. I downloaded "2018/02/11 Baseline" (2 layers, LSTM 500, WE 500, input feed, 20 epochs) trained on Gigaword. Still getting -inf
.
python translate.py -model gigaword_nocopy_acc_51.33_ppl_12.74_e20.pt -src sumdata/Giga/input.txt -tgt sumdata/Giga/task1_ref0.txt -min_length 35 -verbose -stepwise_penalty -coverage_penalty wu -beta 0.2 -length_penalty wu -alpha 0.9 -gpu 0
PRED AVG SCORE: -inf, PRED PPL: inf
GOLD AVG SCORE: -10.7468, GOLD PPL: 46480.8489
They're ALMOST all -inf
. Here's some exceptions.
SENT 1886: ['a', 'french', 'military', 'transport', 'plane', 'carrying', 'the', 'body', 'of', 'late', 'palestinian', 'leader', 'yasser', 'arafat', 'left', 'here', 'thursday', 'after', 'a', 'brief', 'solemn', 'ceremony', 'at', 'a', 'french', 'military', 'airport', 'outside', 'paris', 'for', 'the', 'departure', 'of', 'arafat', "'s", 'body', 'to', 'cairo', ',', 'egypt', 'for', 'a', 'state', 'UNK', '.']
PRED 1886: arafat 's body leaves paris for state <unk> of arafat 's body at paris airport for state <unk> of arafat 's body in cairo for state <unk> of arafat body to cairo for state <unk>
PRED SCORE: -12.4202
GOLD 1886: <unk> : arafat 's body taken to cairo for state <unk>
GOLD SCORE: -112.2260
SENT 1877: ['the', 'marine', 'police', 'and', 'the', 'customs', 'and', 'excise', 'department', 'of', 'hong', 'kong', 'announced', 'friday', 'that', '#', 'million', 'hong', 'kong', 'dollars', '-lrb-', '###,###', 'us', 'dollars', '-rrb-', 'worth', 'of', 'suspected', 'contraband', 'goods', 'were', 'seized', 'off', 'black', 'point', 'in', 'tuen', 'mun', 'in', 'a', 'joint', 'operation', 'mounted', 'thursday', '.']
PRED 1877: # million hk dollars worth of suspected contraband goods seized in hk in joint operation with bc-as-gen <unk> 's <unk> <unk> <unk> <unk> <unk> ; <unk> hong kong police say # million dollars worth of goods seized
PRED SCORE: -12.3212
GOLD 1877: hk police seize # million hk dollars worth of smuggled goods
GOLD SCORE: -114.6402
SENT 1837: ['the', 'political', 'uncertainty', 'in', 'nepal', 'ended', 'wednesday', 'as', 'king', 'birendra', 'finally', 'chose', 'to', 'accept', 'the', 'submission', 'by', '##', 'mps', 'to', 'convene', 'a', 'special', 'session', 'of', 'parliament', 'on', 'february', '##', 'to', 'discuss', 'their', 'no-confidence', 'motion', 'against', 'the', 'present', 'government', '.']
PRED 1837: nepal 's parliament to decide on no-confidence motion on feb. ## to discuss motion of no-confidence motion against gov t in feb. ## parliament session to discuss no-confidence motion against present gov t in february
PRED SCORE: -12.9441
GOLD 1837: nepali king calls special parliament session ending
GOLD SCORE: -78.7076
SENT 1756: ['the', 'largest', 'snake', 'the', 'world', 'has', 'ever', 'known', '--', 'as', 'long', 'as', 'a', 'school', 'bus', 'and', 'as', 'heavy', 'as', 'a', 'small', 'car', '-', '-', 'ruled', 'tropical', 'ecosystems', 'only', 'six', 'million', 'years', 'after', 'the', 'demise', 'of', 'the', 'fearsome', 'tyrannosaurus', 'rex', ',', 'according', 'to', 'a', 'new', 'discovery', 'to', 'be', 'published', 'on', 'thursday', 'in', 'the', 'journal', 'nature', '.']
PRED 1756: new discovery of <unk> <unk> found in small car in new york city of t. rex <unk> <unk> contributed reporting from new york for this article from new york for this article from new york
PRED SCORE: -15.0449
GOLD 1756: world 's largest snake discovered in fossilized rainforest
GOLD SCORE: -99.1610
etc.
@sebastianGehrmann in case you understand what is happening, welcome.
In my case, it is due to the batch size. I can avoid this problem only when I set batch_size
= 1.
Steps to reproduce: Download the pretrained transformer and the newstest test.en, test.de data. (I took the
head -n 500
of both).The problem is that the PRED AVG SCORE is -inf in both cases.
Is that expected/known? Have I chosen a bad beta?