allenai / deep_qa

A deep NLP library, based on Keras / tf, focused on question answering (but useful for other NLP too)
Apache License 2.0
404 stars 132 forks source link

add dauqs as a submodule of deep_qa #139

Closed DeNeutoy closed 7 years ago

DeNeutoy commented 7 years ago

This module provides two sequence to sequence models, with multi-gpu training support, tensorboard logging and beam search decoding.

The first model 'Seq2SeqAttentionModel' is a plain multi-layer sequence to sequence model with attention.

The second model 'Seq2SeqCopyModel' is similar, apart from the fact that the output distribution is a interpolation between the vocab and the unique words in the article (source of the model, in our case the Squad paragraph).

The code is run using the seq2seq_attention.py file at the outermost level of the directory. This has 3 modes: one for training, one for evaluation on the dev set and one for beam search decoding. When this script is started up, it runs indefinitely and the three different options are designed to be run simultaneously, with the eval and decode modes waiting for the training mode to generate model files before beginning their work.

Another key concept here is how the beam search works. In order to do the search, we need to sample from the output distribution at time t and re-embed the result for the decoder to generate the distribution at time t + 1. In order to do this, the number of timesteps of the decoder is set to 1 and the encode_top_state (which retrieves the last state of the last layer of the encoder and the first state of the decoder) and decode_topk (which returns the top k most likely words from the output distribution, along with the next decoder state) are called iteratively.

Finally, the copy mechanism is implemented in the create_combined_distribution method. The main weird part is it's use of the tf.scatter_nd function, which takes a list of indexes, a list of values and a size, and creates a tensor zeros of shape: size with the values in the list assigned to the indexes. The map function just does this over the batch but in parallel , rather than using a for loop.

matt-gardner commented 7 years ago

Hmm, I'm not sure there's a lot of value in doing it that way. If you want to just keep the code separate for now, that's fine. Also, it appears that your repo is private on your own personal account; I don't have access to it.

matt-gardner commented 7 years ago

So, I think this work is great, and it'd be good to add it to deep_qa. But it should be added as a python module with all necessary files, not a git submodule. You don't need to worry about doing that soon (focusing on getting results first), but at some point it'd be nice to do. I'm going to close this PR.