OpenNMT / OpenNMT-py

Open Source Neural Machine Translation and (Large) Language Models in PyTorch
https://opennmt.net/
MIT License
6.77k stars 2.25k forks source link

Two error when training #1995

Closed sjchasel closed 3 years ago

sjchasel commented 3 years ago

Hi, I am following code in Deep Keyphrase Generation. But I got two error when I trained a One2Seq model with Diversity Mechanisms enabled. The command I typed was python train.py -config config/train/config-rnn-keyphrase-one2seq-diverse.yml Then I got [2021-01-21 15:44:07,959 INFO] At step 99, we removed a batch - accum 0 Traceback (most recent call last): File "/home/yons/OpenNMT-kpg-release/onmt/trainer.py", line 377, in _gradient_accumulation model=self.model File "/home/yons/OpenNMT-kpg-release/onmt/utils/loss.py", line 187, in __call__ loss, stats = self._compute_loss(batch, **shard_state) File "/home/yons/OpenNMT-kpg-release/onmt/modules/copy_generator.py", line 264, in _compute_loss semcov_ending_state=self.semcov_ending_state) File "/home/yons/OpenNMT-kpg-release/onmt/utils/loss.py", line 457, in _compute_semantic_coverage_loss neg_idx = np.random.randint(0, batch_size-1, size=(n_sep * n_neg)) File "mtrand.pyx", line 746, in numpy.random.mtrand.RandomState.randint File "_bounded_integers.pyx", line 1254, in numpy.random._bounded_integers._rand_int64 ValueError: low >= high and [2021-01-21 15:44:08,228 INFO] At step 100, we removed a batch - accum 0 Traceback (most recent call last): File "train.py", line 215, in <module> main(opt) File "train.py", line 103, in main single_main(opt, -1) File "/home/yons/OpenNMT-kpg-release/onmt/train_single.py", line 165, in main valid_steps=opt.valid_steps) File "/home/yons/OpenNMT-kpg-release/onmt/trainer.py", line 258, in train report_stats) File "/home/yons/OpenNMT-kpg-release/onmt/trainer.py", line 454, in _maybe_report_training multigpu=self.n_gpu > 1) File "/home/yons/OpenNMT-kpg-release/onmt/utils/report_manager.py", line 77, in report_training step, num_steps, learning_rate, report_stats) File "/home/yons/OpenNMT-kpg-release/onmt/utils/report_manager.py", line 128, in _report_training learning_rate, self.start_time) File "/home/yons/OpenNMT-kpg-release/onmt/utils/statistics.py", line 120, in output self.accuracy(), File "/home/yons/OpenNMT-kpg-release/onmt/utils/statistics.py", line 90, in accuracy return 100 * (self.n_correct / self.n_words) ZeroDivisionError: division by zero

I don't know what went wrong.

francoishernandez commented 3 years ago

As you mention, the code you're running is not OpenNMT-py, but a fork built on it. You may want to post this issue on the repository you're using.

sjchasel commented 3 years ago

Ok, thank you!