bloomsburyai / question-generation

Neural text-to-text question generation
MIT License
217 stars 52 forks source link

Error: Assign requires shapes of both tensors to match. lhs shape= [1536,768] rhs shape= [3072,768] #11

Closed rpoli40 closed 5 years ago

rpoli40 commented 5 years ago

Used train.py --advanced_condition_encoding --nocontext_as_set to retrain the model when more data was added to an existing training dataset from Squad. The model was trained successfully. Redirected the paths to the new model and when running python ./src/demo/instance.py facing the following error:

here here2 ./models/qgen/RL-MALUUBA/1547154689\model.checkpoint-29000 <tensorflow.python.client.session.Session object at 0x00000292C9ECEA20> Traceback (most recent call last): File "C:\Users\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1327, in _do_call return fn(*args) File "C:\Users\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1312, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "C:\Users\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1420, in _call_tf_sessionrun status, run_metadata) File "C:\Users\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 516, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [1536,768] rhs shape= [3072,768] [[Node: save/Assign_4 = Assign[T=DT_FLOAT, _class=["loc:@attn_mech/memory_layer/kernel"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](attn_mech/memory_layer/kernel, save/RestoreV2:4)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "C:\Users\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1327, in _do_call return fn(*args) File "C:\Users\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1312, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "C:\Users\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1420, in _call_tf_sessionrun status, run_metadata) File "C:\Users\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 516, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [1536,768] rhs shape= [3072,768] [[Node: save/Assign_4 = Assign[T=DT_FLOAT, _class=["loc:@attn_mech/memory_layer/kernel"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](attn_mech/memory_layer/kernel, save/RestoreV2:4)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "./src/demo/instance.py", line 76, in tf.app.run() File "C:\Users\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\platform\app.py", line 126, in run _sys.exit(main(argv)) File "./src/demo/instance.py", line 60, in main generator.load_from_chkpt(chkpt_path) File "./src/demo/instance.py", line 31, in load_from_chkpt saver.restore(self.sess, tf.train.latest_checkpoint(path)) File "C:\Users\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\training\saver.py", line 1775, in restore {self.saver_def.filename_tensor_name: save_path}) File "C:\Users\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\client\session.py", line 905, in run run_metadata_ptr) File "C:\Users\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1140, in _run feed_dict_tensor, options, run_metadata) File "C:\Users\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1321, in _do_run run_metadata) File "C:\Users\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\client\session.py", line 1340, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [1536,768] rhs shape= [3072,768] [[Node: save/Assign_4 = Assign[T=DT_FLOAT, _class=["loc:@attn_mech/memory_layer/kernel"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](attn_mech/memory_layer/kernel, save/RestoreV2:4)]]

Caused by op 'save/Assign_4', defined at: File "./src/demo/instance.py", line 76, in tf.app.run() File "C:\Users\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\platform\app.py", line 126, in run _sys.exit(main(argv)) File "./src/demo/instance.py", line 60, in main generator.load_from_chkpt(chkpt_path) File "./src/demo/instance.py", line 30, in load_from_chkpt saver = tf.train.Saver() File "C:\Users\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\training\saver.py", line 1311, in init self.build() File "C:\Users\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\training\saver.py", line 1320, in build self._build(self._filename, build_save=True, build_restore=True) File "C:\Users\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\training\saver.py", line 1357, in _build build_save=build_save, build_restore=build_restore) File "C:\Users\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\training\saver.py", line 809, in _build_internal restore_sequentially, reshape) File "C:\Users\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\training\saver.py", line 470, in _AddRestoreOps assign_ops.append(saveable.restore(saveable_tensors, shapes)) File "C:\Users\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\training\saver.py", line 162, in restore self.op.get_shape().is_fully_defined()) File "C:\Users\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\ops\state_ops.py", line 281, in assign validate_shape=validate_shape) File "C:\Users\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\ops\gen_state_ops.py", line 60, in assign use_locking=use_locking, name=name) File "C:\Users\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "C:\Users\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 3290, in create_op op_def=op_def) File "C:\Users\AppData\Local\Continuum\anaconda3\envs\Python36\lib\site-packages\tensorflow\python\framework\ops.py", line 1654, in init self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): Assign requires shapes of both tensors to match. lhs shape= [1536,768] rhs shape= [3072,768] [[Node: save/Assign_4 = Assign[T=DT_FLOAT, _class=["loc:@attn_mech/memory_layer/kernel"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](attn_mech/memory_layer/kernel, save/RestoreV2:4)]]

looks like the error is coming from load_from_chkpt def load_from_chkpt(self, path): print("here") self.chkpt_path = path with self.model.graph.as_default(): saver = tf.train.Saver() print("here2", tf.train.latest_checkpoint(path), self.sess) saver.restore(self.sess, tf.train.latest_checkpoint(path)) print("################################Loaded model from "+path)

I can see prints of "here" and "here2" but not print("################################Loaded model from "+path)

Any suggestions why this error can happen? Thank you in advance

tomhosking commented 5 years ago

Did you run the demo with the same model setup as training? So, demo.py --advanced_condition_encoding --nocontext_as_set?

rpoli40 commented 5 years ago

I used the flags you suggested and it worked but during my debugging I have figured out that the model was trained without lm and mpcm model because those were not in the repository (they were supposed to be inside saved/lmtest and saved/qanet2) :). When I tried to train those models I paid attention that while language model was trained without any issues and everything looks fine there is something weird going on with mpcm model. train_mpcm imports models from qa.mpcm, while rl_model uses qa.qanet.instance when call to mpcm is commented. What is the difference between those two and where is the training file to train qanetinstance?

tomhosking commented 5 years ago

The LM and QA models are only used during policy gradient training, ie the fine tuning phase (specified with some combo of flags like --restore --policy_gradient). I didn't find that this fine tuning gave better results.

My implementation of the MPCM model isn't working, so I don't recommend using it. The QANet is modified from this code, you can train a model using that repo and it should load fine.

rpoli40 commented 5 years ago

I will try to use the repo you recommend, however I feel completely confused now as I don't understand where the rewards are coming from if you don't use LM and QA models. As far as I understand from the paper the rewards are based on those two.

On Thu, Jan 17, 2019 at 4:01 PM Tom Hosking notifications@github.com wrote:

The LM and QA models are only used during policy gradient training, ie the fine tuning phase (specified with some combo of flags like --restore --policy_gradient). I didn't find that this fine tuning gave better results.

My implementation of the MPCM model isn't working, so I don't recommend using it. The QANet is modified from this code https://github.com/NLPLearn/QANet, you can train a model using that repo and it should load fine.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bloomsburyai/question-generation/issues/11#issuecomment-455330506, or mute the thread https://github.com/notifications/unsubscribe-auth/Abba1B5yO3d2TAN1mChmHSftDLVZlQ2Zks5vEOSmgaJpZM4aDkbB .

tomhosking commented 5 years ago

The model is initially trained using maximum likelihood only - then it's later fine tuned using the rewards. I've found that the maximum likelihood training is sufficient.

rpoli40 commented 5 years ago

What f1 and blue scores were you able to achieve? My dataset contains much longer answers. It is actually few sentences/paragraph as oppose to short phrase used in Squad. What would you recommend to change in that case? Unfortunately the current model trained is not performing well :(.

On Thu, Jan 17, 2019 at 4:19 PM Tom Hosking notifications@github.com wrote:

The model is initially trained using maximum likelihood only - then it's later fine tuned using the rewards. I've found that the maximum likelihood training is sufficient.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bloomsburyai/question-generation/issues/11#issuecomment-455335598, or mute the thread https://github.com/notifications/unsubscribe-auth/Abba1NhhSdUBlElqv4TsdjBJmGyAuwVLks5vEOjRgaJpZM4aDkbB .

tomhosking commented 5 years ago

I reached approximately 13 BLEU on the test set from Du et al 2017. The model performs well for 'cloze' type questions that are stated as simple facts in the context document, but will do less well for anything more complicated than that. Current state of the art reaches approximately 16 BLEU, using a method that's pretty similar to this model.

rpoli40 commented 5 years ago

Then I'm pretty close. I reached about 12. I'm using your code in general to learn and hope I really will be able to get something meaningful with some small changes. But first of all I would like to thank you for putting the code together and for being available to answer my questions. It helps a lot. So to try to use LM and QA models i should use flags --restore and --policy_gradient. The code is very convoluted and hard to know what flags to use to get a desirable behavior.

On Thu, Jan 17, 2019 at 4:38 PM Tom Hosking notifications@github.com wrote:

I reached approximately 13 BLEU on the test set from Du et al 2017. The model performs well for 'cloze' type questions that are stated as simple facts in the context document, but will do less well for anything more complicated than that. Current state of the art reaches approximately 16 BLEU, using a method that's pretty similar to this model.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bloomsburyai/question-generation/issues/11#issuecomment-455341447, or mute the thread https://github.com/notifications/unsubscribe-auth/Abba1IqT-Kb2sr0aJMVeKk4gGL2i0l7vks5vEO1bgaJpZM4aDkbB .

tomhosking commented 5 years ago

Seq2Seq models of this sort are very poor at generating abstractive questions at the moment. If you manage to generate good ones then please publish your method!

Yes, the code is purely for my own research purposes so is a bit of a mess. --restore loads a model from an existing checkpoint file, and --policy_gradient will perform fine tuning based on the rewards. You can also change --qa_weight etc if you want a different combination of rewards. But, I really don't think it's worth the computational cost or hassle!

Something else to bear in mind if you're attempting to use longer answers and documents, the context is cropped to 200 tokens either side of the answer span by default, to reduce memory requirements and training time. If you want to increase this, youll need to adjust --filter_max_tokens, --max_context_len and --max_copy_size (and reduce the batch size). But, I found that it works just as well with the cropped context.

rpoli40 commented 5 years ago

I definitely will publish. But to make sure, your Seq2Seq is based on bidirectional LSTM, right?

On Fri, Jan 18, 2019 at 8:51 AM Tom Hosking notifications@github.com wrote:

Seq2Seq models of this sort are very poor at generating abstractive questions at the moment. If you manage to generate good ones then please publish your method!

Yes, the code is purely for my own research purposes so is a bit of a mess. --restore loads a model from an existing checkpoint file, and --policy_gradient will perform fine tuning based on the rewards. You can also change --qa_weight etc if you want a different combination of rewards. But, I really don't think it's worth the computational cost or hassle!

Something else to bear in mind if you're attempting to use longer answers and documents, the context is cropped to 200 tokens either side of the answer span by default, to reduce memory requirements and training time. If you want to increase this, youll need to adjust --filter_max_tokens, --max_context_len and --max_copy_size (and reduce the batch size). But, I found that it works just as well with the cropped context.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bloomsburyai/question-generation/issues/11#issuecomment-455552546, or mute the thread https://github.com/notifications/unsubscribe-auth/Abba1OUpF5ylvFswJAZa52yVS4UubQmuks5vEdFigaJpZM4aDkbB .

tomhosking commented 5 years ago

Bidir LSTM for the encoder, unidir LSTM for the decoder, with attention and copy mechanism, yep.

rpoli40 commented 5 years ago

Tom, one more question. In the paper they say "we augment each document word embedding with a binary feature that indicates if the document word belongs to the answer". Are you using glove embedding to do so or train the model end-to-end to learn the embeddings as you train?

On Fri, Jan 18, 2019 at 11:07 AM Tom Hosking notifications@github.com wrote:

Bidir LSTM for the encoder, unidir LSTM for the decoder, with attention and copy mechanism, yep.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bloomsburyai/question-generation/issues/11#issuecomment-455598311, or mute the thread https://github.com/notifications/unsubscribe-auth/Abba1MGtlM1YTLNC1U3QfPZFyY9y4aheks5vEfEygaJpZM4aDkbB .

tomhosking commented 5 years ago

I initialise the word embeddings with glove, but allow them to be updated during training. Then I also concatenate the binary feature indicating whether the token is inside the answer (which is not trainable).

rpoli40 commented 5 years ago

where in you code is it getting done?

On Fri, Jan 18, 2019 at 12:06 PM Tom Hosking notifications@github.com wrote:

I initialise the word embeddings with glove, but allow them to be updated during training. Then I also concatenate the binary feature indicating whether the token is inside the answer (which is not trainable).

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bloomsburyai/question-generation/issues/11#issuecomment-455618022, or mute the thread https://github.com/notifications/unsubscribe-auth/Abba1O8JCFIMKOEPtLclZ08O7wdK-DH5ks5vEf8GgaJpZM4aDkbB .

tomhosking commented 5 years ago

https://github.com/bloomsburyai/question-generation/blob/master/src/seq2seq_model.py#L78

tomhosking commented 5 years ago

The augmented binary feature is concated here: /src/seq2seq_model.py#L118

rpoli40 commented 5 years ago

What kind of preprocessing are doing to the text (remove punctuation, some normalization etc.)

On Fri, Jan 18, 2019 at 12:10 PM Tom Hosking notifications@github.com wrote:

The augmented binary feature is concated here: /src/seq2seq_model.py#L118

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bloomsburyai/question-generation/issues/11#issuecomment-455619455, or mute the thread https://github.com/notifications/unsubscribe-auth/Abba1NGk9O87RfSS15O9i8o4laxeOC8Oks5vEgAagaJpZM4aDkbB .

tomhosking commented 5 years ago

It's just lowercased and tokenised.

rpoli40 commented 5 years ago

Hi Tom, what file rom that repo should I use to train the QAnet model?

On Thu, Jan 17, 2019 at 4:01 PM Tom Hosking notifications@github.com wrote:

The LM and QA models are only used during policy gradient training, ie the fine tuning phase (specified with some combo of flags like --restore --policy_gradient). I didn't find that this fine tuning gave better results.

My implementation of the MPCM model isn't working, so I don't recommend using it. The QANet is modified from this code https://github.com/NLPLearn/QANet, you can train a model using that repo and it should load fine.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bloomsburyai/question-generation/issues/11#issuecomment-455330506, or mute the thread https://github.com/notifications/unsubscribe-auth/Abba1B5yO3d2TAN1mChmHSftDLVZlQ2Zks5vEOSmgaJpZM4aDkbB .

rpoli40 commented 5 years ago

Hi, I have actually asked you the wrong question yesterday. What I meant to ask is what is the difference between that repo of QAnet and QAnet you have in your repo and are you using QAnet from your repo to train QA model.

On Mon, Jan 21, 2019 at 1:40 PM Regina Politi politiregina@gmail.com wrote:

Hi Tom, what file rom that repo should I use to train the QAnet model?

On Thu, Jan 17, 2019 at 4:01 PM Tom Hosking notifications@github.com wrote:

The LM and QA models are only used during policy gradient training, ie the fine tuning phase (specified with some combo of flags like --restore --policy_gradient). I didn't find that this fine tuning gave better results.

My implementation of the MPCM model isn't working, so I don't recommend using it. The QANet is modified from this code https://github.com/NLPLearn/QANet, you can train a model using that repo and it should load fine.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bloomsburyai/question-generation/issues/11#issuecomment-455330506, or mute the thread https://github.com/notifications/unsubscribe-auth/Abba1B5yO3d2TAN1mChmHSftDLVZlQ2Zks5vEOSmgaJpZM4aDkbB .