daniel-kukiela / nmt-chatbot

NMT Chatbot
GNU General Public License v3.0
385 stars 213 forks source link

Train.py stopping after one step #135

Closed Nathan-Chell closed 4 years ago

Nathan-Chell commented 5 years ago

Hello, when running train.py the program doesnt loop. Train.py doesnt loop over to have many steps, it just quits after 100 steps (first time it saves).

I0621 09:45:16.584873 19640 saver.py:1280] Restoring parameters from Q:\nmt-chatbot\model\translate.ckpt-200 loaded train model parameters from Q:\nmt-chatbot\model\translate.ckpt-200, time 1.30s W0621 09:45:17.934267 19640 deprecation_wrapper.py:119] From Q:\nmt-chatbot/nmt\train.py:262: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.

I0621 09:45:18.398004 19640 saver.py:1280] Restoring parameters from Q:\nmt-chatbot\model\translate.ckpt-200 loaded infer model parameters from Q:\nmt-chatbot\model\translate.ckpt-200, time 0.45s

47

src: It ' s kind of hilarious because all of the racist submissions get taken down in like four seconds , but they keep trying anyway . Silly racists .
ref: I know I keep reporting them but they keep coming back !
nmt: dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark dark

I0621 09:45:19.939544 19640 saver.py:1280] Restoring parameters from Q:\nmt-chatbot\model\translate.ckpt-200 loaded eval model parameters from Q:\nmt-chatbot\model\translate.ckpt-200, time 0.36s eval dev: perplexity nan, time 0s, Fri Jun 21 09:45:21 2019. eval test: perplexity nan, time 0s, Fri Jun 21 09:45:21 2019. I0621 09:45:21.644288 19640 saver.py:1280] Restoring parameters from Q:\nmt-chatbot\model\translate.ckpt-200 2019-06-21 09:45:21.944533: I tensorflow/core/kernels/lookup_util.cc:376] Table trying to initialize from file Q:\nmt-chatbot\data\vocab.from is already initialized. 2019-06-21 09:45:21.944812: I tensorflow/core/kernels/lookup_util.cc:376] Table trying to initialize from file Q:\nmt-chatbot\data\vocab.to is already initialized. 2019-06-21 09:45:21.944815: I tensorflow/core/kernels/lookup_util.cc:376] Table trying to initialize from file Q:\nmt-chatbot\data\vocab.to is already initialized. loaded infer model parameters from Q:\nmt-chatbot\model\translate.ckpt-200, time 0.32s

External evaluation, global step 200

decoding to output Q:\nmt-chatbot\model\output_dev. done, num sentences 100, num translations per input 1, time 9s, Fri Jun 21 09:45:31 2019. bleu dev: 0.0 saving hparams to Q:\nmt-chatbot\model\hparams

External evaluation, global step 200

decoding to output Q:\nmt-chatbot\model\output_test. done, num sentences 100, num translations per input 1, time 9s, Fri Jun 21 09:45:41 2019. bleu test: 0.0 saving hparams to Q:\nmt-chatbot\model\hparams

Start step 200, lr 0.0001, Fri Jun 21 09:45:41 2019

Init train iterator, skipping 0 elements

global step 300 lr 0.0001 step-time 0.47s wps 6.94K ppl nan gN nan bleu 0.00 step 300 overflow, stop early I0621 09:46:53.846735 19640 saver.py:1280] Restoring parameters from Q:\nmt-chatbot\model\translate.ckpt-300 2019-06-21 09:46:54.160984: I tensorflow/core/kernels/lookup_util.cc:376] Table trying to initialize from file Q:\nmt-chatbot\data\vocab.from is already initialized. 2019-06-21 09:46:54.161223: I tensorflow/core/kernels/lookup_util.cc:376] Table trying to initialize from file Q:\nmt-chatbot\data\vocab.to is already initialized. 2019-06-21 09:46:54.161217: I tensorflow/core/kernels/lookup_util.cc:376] Table trying to initialize from file Q:\nmt-chatbot\data\vocab.to is already initialized. loaded infer model parameters from Q:\nmt-chatbot\model\translate.ckpt-300, time 0.33s

20

src: * * P * *
ref: * * S * *
nmt: dark dark dark dark dark dark dark dark dark dark

I0621 09:46:54.246667 19640 saver.py:1280] Restoring parameters from Q:\nmt-chatbot\model\translate.ckpt-300 2019-06-21 09:46:54.524979: I tensorflow/core/kernels/lookup_util.cc:376] Table trying to initialize from file Q:\nmt-chatbot\data\vocab.to is already initialized. 2019-06-21 09:46:54.524978: I tensorflow/core/kernels/lookup_util.cc:376] Table trying to initialize from file Q:\nmt-chatbot\data\vocab.from is already initialized. loaded eval model parameters from Q:\nmt-chatbot\model\translate.ckpt-300, time 0.29s eval dev: perplexity nan, time 0s, Fri Jun 21 09:46:55 2019. eval test: perplexity nan, time 0s, Fri Jun 21 09:46:55 2019. I0621 09:46:55.632743 19640 saver.py:1280] Restoring parameters from Q:\nmt-chatbot\model\translate.ckpt-300 2019-06-21 09:46:55.948274: I tensorflow/core/kernels/lookup_util.cc:376] Table trying to initialize from file Q:\nmt-chatbot\data\vocab.from is already initialized. 2019-06-21 09:46:55.948283: I tensorflow/core/kernels/lookup_util.cc:376] Table trying to initialize from file Q:\nmt-chatbot\data\vocab.to is already initialized. 2019-06-21 09:46:55.948303: I tensorflow/core/kernels/lookup_util.cc:376] Table trying to initialize from file Q:\nmt-chatbot\data\vocab.to is already initialized. loaded infer model parameters from Q:\nmt-chatbot\model\translate.ckpt-300, time 0.33s

External evaluation, global step 300

decoding to output Q:\nmt-chatbot\model\output_dev. done, num sentences 100, num translations per input 1, time 9s, Fri Jun 21 09:47:05 2019. bleu dev: 0.0 saving hparams to Q:\nmt-chatbot\model\hparams

External evaluation, global step 300

decoding to output Q:\nmt-chatbot\model\output_test. done, num sentences 100, num translations per input 1, time 9s, Fri Jun 21 09:47:15 2019. bleu test: 0.0 saving hparams to Q:\nmt-chatbot\model\hparams

Final, step 300 lr 0.0001 step-time 0.00 wps 0.00K ppl 0.00, dev ppl nan, dev bleu 0.0, test ppl nan, test bleu 0.0, Fri Jun 21 09:47:15 2019

Done training!, time 94s, Fri Jun 21 09:47:15 2019.

Start evaluating saved best models.

2019-06-21 09:47:16.165268: I tensorflow/core/kernels/lookup_util.cc:376] Table trying to initialize from file Q:\nmt-chatbot\data\vocab.from is already initialized. 2019-06-21 09:47:16.165340: I tensorflow/core/kernels/lookup_util.cc:376] Table trying to initialize from file Q:\nmt-chatbot\data\vocab.to is already initialized. 2019-06-21 09:47:16.165343: I tensorflow/core/kernels/lookup_util.cc:376] Table trying to initialize from file Q:\nmt-chatbot\data\vocab.to is already initialized. created infer model with fresh parameters, time 0.18s

42

src: I ' m not really worried about either of them to be honest as long as Lovie Smith is at the helm of the pirate ship .
ref: True enough .
nmt: Giovanni forwarded Tora Tora Tora Tora Tora Tora meio Politically Politically LDLC LDLC LDLC Nasri Nasri Nasri Nasri Nasri stud stud stud stud stud stud Talon Talon Talon Talon Talon Talon Talon Talon systematic mobbed mobbed mobbed mobbed mobbed mobbed mobbed leap leap leap leap leap leap leap leap nurse nurse nurse nurse nurse

2019-06-21 09:47:16.579927: I tensorflow/core/kernels/lookup_util.cc:376] Table trying to initialize from file Q:\nmt-chatbot\data\vocab.from is already initialized. 2019-06-21 09:47:16.579939: I tensorflow/core/kernels/lookup_util.cc:376] Table trying to initialize from file Q:\nmt-chatbot\data\vocab.to is already initialized. created eval model with fresh parameters, time 0.14s eval dev: perplexity nan, time 0s, Fri Jun 21 09:47:17 2019. eval test: perplexity nan, time 0s, Fri Jun 21 09:47:17 2019. 2019-06-21 09:47:17.750041: I tensorflow/core/kernels/lookup_util.cc:376] Table trying to initialize from file Q:\nmt-chatbot\data\vocab.to is already initialized. 2019-06-21 09:47:17.750041: I tensorflow/core/kernels/lookup_util.cc:376] Table trying to initialize from file Q:\nmt-chatbot\data\vocab.to is already initialized. 2019-06-21 09:47:17.750047: I tensorflow/core/kernels/lookup_util.cc:376] Table trying to initialize from file Q:\nmt-chatbot\data\vocab.from is already initialized. created infer model with fresh parameters, time 0.18s bleu dev: 0.0 bleu test: 0.0

Best bleu, step 0 step-time 0.00 wps 0.00K, dev ppl nan, dev bleu 0.0, test ppl nan, test bleu 0.0, Fri Jun 21 09:47:17 2019``

Nathan-Chell commented 5 years ago

Okay, the formatting of the error went weird. But everything after my initial sentences is the test I get from CMD, minus all of the initialising gpu stuff

Nathan-Chell commented 5 years ago

By the way, I'm using the code used in the Sentdex tutorial

Nathan-Chell commented 5 years ago

Something of note, the error says 'Step 300 overflow, stop early'. Bare in mind I have run this three times with it erroring out on all three occasions

Nathan-Chell commented 5 years ago

Turns out, every time i run the code i get an Overflow error

Nathan-Chell commented 5 years ago

My ppl is staying around 9 and i have a bleu score of 0

` loaded infer model parameters from Q:\nmt-chatbot\model\translate.ckpt-300, time 0.31s

External evaluation, global step 300

decoding to output Q:\nmt-chatbot\model\output_dev. done, num sentences 100, num translations per input 1, time 9s, Fri Jun 21 23:04:13 2019. bleu dev: 0.0 saving hparams to Q:\nmt-chatbot\model\hparams

External evaluation, global step 300

decoding to output Q:\nmt-chatbot\model\output_test. done, num sentences 100, num translations per input 1, time 9s, Fri Jun 21 23:04:23 2019. bleu test: 0.0 saving hparams to Q:\nmt-chatbot\model\hparams

Start step 300, lr 0.001, Fri Jun 21 23:04:23 2019

Init train iterator, skipping 6400 elements

global step 400 lr 0.001 step-time 0.47s wps 6.85K ppl nan gN nan bleu 0.00 step 400 overflow, stop early Overflow global step 500 lr 0.001 step-time 0.35s wps 9.52K ppl nan gN nan bleu 0.00 step 500 overflow, stop early Overflow global step 600 lr 0.001 step-time 0.35s wps 9.43K ppl nan gN nan bleu 0.00 step 600 overflow, stop early Overflow global step 700 lr 0.001 step-time 0.35s wps 9.27K ppl nan gN nan bleu 0.00 step 700 overflow, stop early Overflow`

Nathan-Chell commented 5 years ago

Oh no sorry, i have a ppl of 0 and a Gn of 0

Nathan-Chell commented 5 years ago

The issue was that I had prepared my data to have a vocab of 100k, however, I changed the vocab inside of the hparams text file to have only 80k words. This caused an overflow error.

kaljitism commented 4 years ago

Data is not that rich to learn that big epoch. Even the best models currently like GPT-2, DialoGPT and even Meena, uses a vocab of 55k. Kindly dont increase the vocab size.