Open eschnou opened 7 years ago
An other way to get diverse answer I had thought of was to sample the answer from the softmax outputs as done in https://github.com/karpathy/char-rnn . In theory, it should be simpler to implement as soon this issue will be solved https://github.com/tensorflow/tensorflow/issues/5391.
@Conchylicultor I don't think this will work in our case. By sampling the output soft-max you only sample the last output, so you will get a list of candidates for the last word only. The way I understand seq2seq is that the output sequence is coming from the sequence of decoders, not just the final decoder output. The case you refer too is not a sequence decoder but just a RNN in which they are only interested by the final output , which is the prediction of one character. Then softmax sampling makes sense. Makes sense ?
With the loop_function
parameter, we would sample dynamically the next word at every time step. See this previous answer for more details https://github.com/Conchylicultor/DeepQA/issues/17#issuecomment-258221849.
On the seq2seq model used here, the sequence decoder is just a RNN which take the encoder output as state input so it's really similar to what karpathy did.
Edit: the loop_function
would need to look something like that https://github.com/Conchylicultor/MusicGenerator/blob/master/deepmusic/modules/loopprocessing.py#L58
Ha ok, I was still focusing on generating n-best candidate solutions. With your approach, you effectively introduce variation but still only generate one output. Already a good start indeed to avoid repetition, but unfortunately not sufficient to generate n-best list and apply a different scoring functions.
Yes, indeed, this softmax sampling won't be enough to generate the N-best list. That would require to use beam search as you proposed.
The current implementation always output the most probable answer, which can lead to boring, repetitive, safe answers such as "i don't know". Solving this requires to generate not a single answer but a whole list of the N-best candidates. We can then randomly pick one answer, or use different loss functions to choose the best candidate. For example implementing the strategies described here.
Generating the N-best output in seq2seq requires to perform a beam search on the decoder output. It seems it has already been implemented in this gist and extensively discussed on this tensorflow issue.