bshao001 / ChatLearner

A chatbot implemented in TensorFlow based on the seq2seq model, with certain rules integrated.
Apache License 2.0
538 stars 212 forks source link

cannot train when new vocab file is generated . #78

Open Asrix-AI opened 5 years ago

Asrix-AI commented 5 years ago

I included some of your own data and my own data and created vocab file. But when I trained the model bottrainer.py it starts for running first epoch and then stops giving the error as: Traceback (most recent call last): File "bottrainer.py", line 161, in bt.train(res_dir) File "bottrainer.py", line 88, in train step_result = self.model.train_step(sess, learning_rate=learning_rate) File "/home/ruksin/PycharmProjects/chatlearner/chatbot/modelcreator.py", line 122, in train_step feed_dict={self.learning_rate: learning_rate}) File "/home/ruksin/PycharmProjects/chatlearner/venv/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 889, in run run_metadata_ptr) File "/home/ruksin/PycharmProjects/chatlearner/venv/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1120, in _run feed_dict_tensor, options, run_metadata) File "/home/ruksin/PycharmProjects/chatlearner/venv/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run options, run_metadata) File "/home/ruksin/PycharmProjects/chatlearner/venv/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: pos 0 out of range for stringb'' at index 0 [[Node: Substr = Substr[T=DT_INT32](arg0, Substr/pos, Substr/len)]] [[Node: IteratorGetNext = IteratorGetNextoutput_shapes=[[?,?], [?,?], [?,?], [?], [?]], output_types=[DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_INT32], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

can anyone please help me with this. Im I doing any wrong? i reduced the batch size to 2 and also the units to 800. I was able to train with your own data before , but when i tried it for my own data along with some of your data it shows this.

the vocab generator python file shows this result: Vocab size after all base data files scanned: 256 Vocab size after cornell data file scanned: 18354 The final vocab file generated. Vocab size: 18380 Please can anyone help me with this.

bshao001 commented 5 years ago

It is very likely that your data is having some kind of problem, but I cannot tell without looking at the details. You can google around, and I believe I have seen similar errors before from someone else.