Closed ArashHosseini closed 6 years ago
there no limit for training ... when error get small close to 1.0 you can stop it
. i get to 1.2 in some 2-3 days leaving by night mostly on cpu only
Start with a small data set, something less than 2000 pairs of conversations, and then make sure your model can remember 99% of them, if not all of them. Test with questions in the training set, and you should see the exactly same answer from the training data. Even with a normal CPU, this should be something a few hours. After you get a feeling, you can slowly increase the training data size.
For your assignment purpose, you may stick with the legacy_seq2seq model for a chatbot. However, the new NMT model is much better (faster) both in training and prediction. And it is much easier to apply other features such as beam search. That should be the choice to train a chatbot.
Thanks for the great response, i am looking currently into NMT, i am not sure how a training for a chatbot looking like in a Translation Model, can you describe a little more detail or maybe a really short tuts how to train the chatbot, should i use cornell_movie_dialogs Corpus for training? thank you very much
2017-11-10 14:37 GMT+01:00 Bo Shao notifications@github.com:
Start with a small data set, something less than 2000 pairs of conversations, and then make sure your model can remember 99% of them, if not all of them. Test with questions in the training set, and you should see the exactly same answer from the training data. Even with a normal CPU, this should be something a few hours. After you get a feeling, you can slowly increase the training data size.
For your assignment purpose, you may stick with the legacy_seq2seq model for a chatbot. However, the new NMT model is much better (faster) both in training and prediction. And it is much easier to apply other features such as beam search. That should be the choice to train a chatbot.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/chiphuyen/stanford-tensorflow-tutorials/issues/72#issuecomment-343475178, or mute the thread https://github.com/notifications/unsubscribe-auth/AI2BmvpvrNqiZdocXOUwAlw8IpN8wAlRks5s1FGXgaJpZM4QPfi0 .
@ArashHosseini I have developed a chatbot using NMT here: https://github.com/bshao001/ChatLearner. In the same repository, you can find another branch, in which there was a chatbot developed using the legacy_seq2seq model. You are welcome to open issues there, and I will explain you in detail.
Google named the new seq2seq model NMT model only because it is originally used for developing a translation system. However, it is a seq2seq model: an model that allows you to create a mapping function from an input sequence to an output sequence. Any real world problem that have this type feature can use this model, such as translation, question answering (chatbot), speech recognition.
You can use any corpus for training, it does not matter. However, in order to achieve nice results, you do need high quality of the training data (not just the quantity, but also the quality, in my opinion). In my repository, it also have well formatted training data, which will give you decent results.
Awesome i will spend time into it, thanks a lot...... can i keep tf1.3 or should i upgrade?! im2txt is running currently on 1.3
2017-11-13 23:09 GMT+01:00 Bo Shao notifications@github.com:
@ArashHosseini https://github.com/arashhosseini I have developed a chatbot using NMT here: https://github.com/bshao001/ChatLearner. In the same repository, you can find another branch, in which there was a chatbot developed using the legacy_seq2seq model. You are welcome to open issues there, and I will explain you in detail.
Google named the new seq2seq model NMT model only because it is originally used for developing a translation system. However, it is a seq2seq model: an model that allows you to create a mapping function from an input sequence to an output sequence. Any real world problem that have this type feature can use this model, such as translation, question answering (chatbot), speech recognition.
You can use any corpus for training, it does not matter. However, in order to achieve nice results, you do need high quality of the training data (not just the quantity, but also the quality, in my opinion). In my repository, it also have well formatted training data, which will give you decent results.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/chiphuyen/stanford-tensorflow-tutorials/issues/72#issuecomment-344075498, or mute the thread https://github.com/notifications/unsubscribe-auth/AI2BmltywuLfy_pgBbGdhojfmoiPKZBDks5s2L4GgaJpZM4QPfi0 .
Hey, i am evaluating your rep, get this response
flyn@tron:~/git/ChatLearner/chatbot$ python3.5 bottrainer.py
Traceback (most recent call last): File "bottrainer.py", line 20, in
from chatbot.tokenizeddata import TokenizedData ImportError: No module named 'chatbot'
is there any reason why you importing TokenizedData from chatbot.tokenizeddata and not tokenizeddata? i changed the related imports, then i found this
flyn@tron:~/git/ChatLearner/chatbot$ python3.5 bottrainer.py
Traceback (most recent call last): File "bottrainer.py", line 136, in
from settings import PROJECT_ROOT ImportError: No module named 'settings' where is settings coming from? thanks arash
2017-11-13 23:29 GMT+01:00 Arash Hosseini s.arashhosseini@gmail.com:
Awesome i will spend time into it, thanks a lot...... can i keep tf1.3 or should i upgrade?! im2txt is running currently on 1.3
2017-11-13 23:09 GMT+01:00 Bo Shao notifications@github.com:
@ArashHosseini https://github.com/arashhosseini I have developed a chatbot using NMT here: https://github.com/bshao001/ChatLearner. In the same repository, you can find another branch, in which there was a chatbot developed using the legacy_seq2seq model. You are welcome to open issues there, and I will explain you in detail.
Google named the new seq2seq model NMT model only because it is originally used for developing a translation system. However, it is a seq2seq model: an model that allows you to create a mapping function from an input sequence to an output sequence. Any real world problem that have this type feature can use this model, such as translation, question answering (chatbot), speech recognition.
You can use any corpus for training, it does not matter. However, in order to achieve nice results, you do need high quality of the training data (not just the quantity, but also the quality, in my opinion). In my repository, it also have well formatted training data, which will give you decent results.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/chiphuyen/stanford-tensorflow-tutorials/issues/72#issuecomment-344075498, or mute the thread https://github.com/notifications/unsubscribe-auth/AI2BmltywuLfy_pgBbGdhojfmoiPKZBDks5s2L4GgaJpZM4QPfi0 .
ok i found out, setting.py is located on meta leyer......now everything is running
string_to_index/hash_table/Const: (Const):
/job:localhost/replica:0/task:0/device:CPU:0 # Training loop started @ 2017-11-14 19:50:02 Traceback (most recent call last): File "/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1323, in _do_call return fn(*args) File "/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1302, in _run_fn status, run_metadata) File "/usr/local/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 473, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1024,50785] [[Node: gradients/dynamic_seq2seq/decoder/output_projection/Tensordot/MatMul_grad/MatMul_1 = MatMulT=DT_FLOAT, transpose_a=true, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"]] [[Node: gradients/dynamic_seq2seq/decoder/decoder/while/BasicDecoderStep/TrainingHelperNextInputs/cond/Merge_grad/cond_grad/_205 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1599_gradients/dynamic_seq2seq/decoder/decoder/while/BasicDecoderStep/TrainingHelperNextInputs/cond/Merge_grad/cond_grad", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "bottrainer.py", line 141, in
bt.train(res_dir) File "bottrainer.py", line 71, in train step_result = self.model.train_step(sess, learning_rate=learning_rate) File "/home/flyn/git/ChatLearner/chatbot/modelcreator.py", line 122, in train_step feed_dict={self.learning_rate: learning_rate}) File "/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 889, in run run_metadata_ptr) File "/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1120, in _run feed_dict_tensor, options, run_metadata) File "/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run options, run_metadata) File "/usr/local/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[1024,50785] [[Node: gradients/dynamic_seq2seq/decoder/output_projection/Tensordot/MatMul_grad/MatMul_1 = MatMulT=DT_FLOAT, transpose_a=true, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"]] [[Node: gradients/dynamic_seq2seq/decoder/decoder/while/BasicDecoderStep/TrainingHelperNextInputs/cond/Merge_grad/cond_grad/_205 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1599_gradients/dynamic_seq2seq/decoder/decoder/while/BasicDecoderStep/TrainingHelperNextInputs/cond/Merge_grad/cond_grad", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]] Caused by op 'gradients/dynamic_seq2seq/decoder/output_projection/Tensordot/MatMul_grad/MatMul_1', defined at: File "bottrainer.py", line 140, in
bt = BotTrainer(corpus_dir=corp_dir) File "bottrainer.py", line 35, in init batch_input=self.train_batch) File "/home/flyn/git/ChatLearner/chatbot/modelcreator.py", line 86, in init gradients = tf.gradients(self.train_loss, params) File "/usr/local/lib/python3.5/site-packages/tensorflow/python/ops/gradients_impl.py", line 581, in gradients grad_scope, op, func_call, lambda: grad_fn(op, out_grads)) File "/usr/local/lib/python3.5/site-packages/tensorflow/python/ops/gradients_impl.py", line 353, in _MaybeCompile return grad_fn() # Exit early File "/usr/local/lib/python3.5/site-packages/tensorflow/python/ops/gradients_impl.py", line 581, in grad_scope, op, func_call, lambda: grad_fn(op, out_grads)) File "/usr/local/lib/python3.5/site-packages/tensorflow/python/ops/math_grad.py", line 922, in _MatMulGrad grad_b = math_ops.matmul(a, grad, transpose_a=True) File "/usr/local/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py", line 1891, in matmul a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name) File "/usr/local/lib/python3.5/site-packages/tensorflow/python/ops/gen_math_ops.py", line 2437, in _mat_mul name=name) File "/usr/local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/usr/local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op op_def=op_def) File "/usr/local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1470, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access...which was originally created as op 'dynamic_seq2seq/decoder/output_projection/Tensordot/MatMul', defined at: File "bottrainer.py", line 140, in
bt = BotTrainer(corpus_dir=corp_dir) [elided 0 identical lines from previous traceback] File "bottrainer.py", line 35, in init batch_input=self.train_batch) File "/home/flyn/git/ChatLearner/chatbot/modelcreator.py", line 65, in init res = self.build_graph(hparams, scope=scope) File "/home/flyn/git/ChatLearner/chatbot/modelcreator.py", line 134, in build_graph encoder_outputs, encoder_state, hparams) File "/home/flyn/git/ChatLearner/chatbot/modelcreator.py", line 221, in _build_decoder logits = self.output_layer(outputs.rnn_output) File "/usr/local/lib/python3.5/site-packages/tensorflow/python/layers/base.py", line 575, in call outputs = self.call(inputs, *args, **kwargs) File "/usr/local/lib/python3.5/site-packages/tensorflow/python/layers/core.py", line 156, in call [0]]) File "/usr/local/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py", line 2520, in tensordot ab_matmul = matmul(a_reshape, b_reshape) File "/usr/local/lib/python3.5/site-packages/tensorflow/python/ops/math_ops.py", line 1891, in matmul a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name) File "/usr/local/lib/python3.5/site-packages/tensorflow/python/ops/gen_math_ops.py", line 2437, in _mat_mul name=name) File "/usr/local/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/usr/local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op op_def=op_def) ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[1024,50785] [[Node: gradients/dynamic_seq2seq/decoder/output_projection/Tensordot/MatMul_grad/MatMul_1 = MatMulT=DT_FLOAT, transpose_a=true, transpose_b=false, _device="/job:localhost/replica:0/task:0/device:GPU:0"]] [[Node: gradients/dynamic_seq2seq/decoder/decoder/while/BasicDecoderStep/TrainingHelperNextInputs/cond/Merge_grad/cond_grad/_205 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_1599_gradients/dynamic_seq2seq/decoder/decoder/while/BasicDecoderStep/TrainingHelperNextInputs/cond/Merge_grad/cond_grad", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
i am really thankful for your help.
2017-11-14 19:33 GMT+01:00 Arash Hosseini s.arashhosseini@gmail.com:
Hey, i am evaluating your rep, get this response
flyn@tron:~/git/ChatLearner/chatbot$ python3.5 bottrainer.py
Traceback (most recent call last): File "bottrainer.py", line 20, in
from chatbot.tokenizeddata import TokenizedData ImportError: No module named 'chatbot' is there any reason why you importing TokenizedData from chatbot.tokenizeddata and not tokenizeddata? i changed the related imports, then i found this
flyn@tron:~/git/ChatLearner/chatbot$ python3.5 bottrainer.py
Traceback (most recent call last): File "bottrainer.py", line 136, in
from settings import PROJECT_ROOT ImportError: No module named 'settings' where is settings coming from? thanks arash
2017-11-13 23:29 GMT+01:00 Arash Hosseini s.arashhosseini@gmail.com:
Awesome i will spend time into it, thanks a lot...... can i keep tf1.3 or should i upgrade?! im2txt is running currently on 1.3
2017-11-13 23:09 GMT+01:00 Bo Shao notifications@github.com:
@ArashHosseini https://github.com/arashhosseini I have developed a chatbot using NMT here: https://github.com/bshao001/ChatLearner. In the same repository, you can find another branch, in which there was a chatbot developed using the legacy_seq2seq model. You are welcome to open issues there, and I will explain you in detail.
Google named the new seq2seq model NMT model only because it is originally used for developing a translation system. However, it is a seq2seq model: an model that allows you to create a mapping function from an input sequence to an output sequence. Any real world problem that have this type feature can use this model, such as translation, question answering (chatbot), speech recognition.
You can use any corpus for training, it does not matter. However, in order to achieve nice results, you do need high quality of the training data (not just the quantity, but also the quality, in my opinion). In my repository, it also have well formatted training data, which will give you decent results.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/chiphuyen/stanford-tensorflow-tutorials/issues/72#issuecomment-344075498, or mute the thread https://github.com/notifications/unsubscribe-auth/AI2BmltywuLfy_pgBbGdhojfmoiPKZBDks5s2L4GgaJpZM4QPfi0 .
@ArashHosseini Please do me a favor not to ask me questions here. You are welcome to open issues in my repository and I will be glad to help you there.
maybe this question is already answered.....How long is average training time? should i interrupt the training?, its now training since 6 days and on iter 4374600......thanks
update: training data:
i interrupted the training, the conversation content looks not really good....