Cannot join current thread

othman-zennaki commented 5 years ago

Got the following error while training the module !

Run ID is 1562759522 Model type is RL-S2S

<_io.TextIOWrapper name='./data/train-v1.1.json' mode='r' encoding='ANSI_X3.4-1968'> <_io.TextIOWrapper name='./data/dev-v1.1.json' mode='r' encoding='ANSI_X3.4-1968'> Loaded SQuAD with 88825 triples 50131 300 WARNING:tensorflow:From /content/clouderizer/bloomsburyai_question-generation/code/src/seq2seq_model.py:126: BasicLSTMCell.__init__ (from tensorflow.python.ops.rnn_cell_impl) is deprecated and will be removed in a future version. Instructions for updating: This class is deprecated, please use tf.nn.rnn_cell.LSTMCell, which supports all the feature this cell currently has. Please replace the existing code with tf.nn.rnn_cell.LSTMCell(name='basic_lstm_cell'). WARNING:tensorflow:From /content/clouderizer/bloomsburyai_question-generation/code/src/seq2seq_model.py:444: calling reduce_sum (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version. Instructions for updating: keep_dims is deprecated, use keepdims instead Modifying Seq2Seq model to incorporate RL rewards Total number of trainable parameters: 34871537 2019-07-10 11:54:30.456140: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA Training: 3%|6 | 1000/34675 [1:39:42<58:27:08, 6.25s/it] Eval 1000: 0%| | 1/660 [00:06<1:08:28, 6.24s/it] .... Eval 1000: 100%|##############################| 660/660 [46:26<00:00, 3.96s/it] New best NLL! 65.91491210731593 Saving... Training: 3%|6 | 1016/34675 [2:27:42<87:41:03, 9.38s/it]Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1334, in _do_call return fn(*args) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: 0-th value returned by pyfunc_22 is double, but expects string [[{{node PyFunc_2}} = PyFunc[Tin=[DT_STRING, DT_INT32, DT_STRING], Tout=[DT_STRING, DT_INT32, DT_INT32, DT_INT32], token="pyfunc_22", _device="/device:CPU:*"](arg2, arg3, arg0)]] [[{{node IteratorGetNext}} = IteratorGetNext[output_shapes=[[?,?], [?,?], [?,?], [?], [?], [?,?], [?,?], [?,?,?], [?], [?,?], [?,?], [?], [?,?], [?]], output_types=[DT_STRING, DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_STRING, DT_INT32, DT_FLOAT, DT_INT32, DT_STRING, DT_INT32, DT_INT32, DT_INT32, DT_INT32], _device="/job:localhost/replica:0/task:0/device:CPU:0"](IteratorV2)]] During handling of the above exception, another exception occurred: Traceback (most recent call last): File "./src/train.py", line 486, in tf.app.run() File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "./src/train.py", line 204, in main train_batch, curr_batch_size = train_data_source.get_batch() File "/content/clouderizer/bloomsburyai_question-generation/code/src/datasources/squad_streamer.py", line 43, in get_batch return self.sess.run([self.batch_as_nested_tuple, self.batch_len]) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 929, in run run_metadata_ptr) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1152, in _run feed_dict_tensor, options, run_metadata) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1328, in _do_run run_metadata) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1348, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: 0-th value returned by pyfunc_22 is double, but expects string [[{{node PyFunc_2}} = PyFunc[Tin=[DT_STRING, DT_INT32, DT_STRING], Tout=[DT_STRING, DT_INT32, DT_INT32, DT_INT32], token="pyfunc_22", _device="/device:CPU:*"](arg2, arg3, arg0)]] [[node IteratorGetNext (defined at /content/clouderizer/bloomsburyai_question-generation/code/src/datasources/squad_streamer.py:107) = IteratorGetNext[output_shapes=[[?,?], [?,?], [?,?], [?], [?], [?,?], [?,?], [?,?,?], [?], [?,?], [?,?], [?], [?,?], [?]], output_types=[DT_STRING, DT_INT32, DT_INT32, DT_INT32, DT_INT32, DT_STRING, DT_INT32, DT_FLOAT, DT_INT32, DT_STRING, DT_INT32, DT_INT32, DT_INT32, DT_INT32], _device="/job:localhost/replica:0/task:0/device:CPU:0"](IteratorV2)]] Exception ignored in: Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tqdm/_tqdm.py", line 931, in __del__ self.close() File "/usr/local/lib/python3.6/dist-packages/tqdm/_tqdm.py", line 1133, in close self._decr_instances(self) File "/usr/local/lib/python3.6/dist-packages/tqdm/_tqdm.py", line 496, in _decr_instances cls.monitor.exit() File "/usr/local/lib/python3.6/dist-packages/tqdm/_monitor.py", line 52, in exit self.join() File "/usr/lib/python3.6/threading.py", line 1053, in join raise RuntimeError("cannot join current thread") RuntimeError: cannot join current thread

tomhosking commented 5 years ago

The real error is here: tensorflow.python.framework.errors_impl.InvalidArgumentError: 0-th value returned by pyfunc_22 is double, but expects string

Have you modified the code that loads the data? I don't recognise this output: ``<_io.TextIOWrapper name='./data/train-v1.1.json' mode='r' encoding='ANSI_X3.4-1968'>

<_io.TextIOWrapper name='./data/dev-v1.1.json' mode='r' encoding='ANSI_X3.4-1968'>``

othman-zennaki commented 5 years ago

Thank you for your answer. Yes, I did.

othman-zennaki commented 5 years ago

Could you please tell me how to solve this problem. Thanks

tomhosking commented 5 years ago

Part of the code that processes the input data has received the wrong data type - if you've modified that part of the code then I can't help.

othman-zennaki commented 5 years ago

The modification only concerned just the display of the dataset_file.

tomhosking commented 5 years ago

Which dataset are you using?

othman-zennaki commented 5 years ago

I use French translation of SQuAD.

tomhosking commented 5 years ago

Oh cool, I didn't know that existed! Are you able to post the files here?

I'm guessing the problem is with your data - the model trains successfully up until step 1016, so it's working for most of the examples. Try setting shuffle=False here: https://github.com/bloomsburyai/question-generation/blob/master/src/train.py#L151 Then run training again, this will tell you which example is failing.

However given that the error is that it found a double not a string, I suspect you might have a numeric answer somewhere that is encoded in the JSON as a number not a string ie {"answer": 1} but should be {"answer": "1"}. You can check this with a quick script before attempting retraining.

othman-zennaki commented 5 years ago

I'm still translating it. We have done the manual translation of 25% En SQuAD.

othman-zennaki commented 5 years ago

It worked. Thank you @tomhosking for your help.

bloomsburyai / question-generation

Cannot join current thread #19