Open kuhanw opened 6 years ago
Probably training a separate language model on the dataset containing target responses would be an idea to calculate P(T)? Or using dummy encoder_inputs means given no input sentence, what's the probability of target response T.
@kuhanw I think it might make sense. Using PAD as the initial input of the encoder results in the same initial state of the decoder corresponding different p(T|S). Naturally, the first several output words of the decoder are more influenced by p(T|S) rather than U(T), that is consistent with the original thought of JiweiLi's paper. Commonly, the decoder is considered as a language model, and I think input empty is a simple way to implement anti-MMI without external Model. I hope it could be explicated by the author.
Hi all,
I am trying to understand the implementation of the anti-LM model, in particular the meaning of this line:
line 128: all_prob_t = model_step(dummy_encoder_inputs, cand['dec_inp'], dptr, target_weights, bucket_id)
where dummy_encoder_inputs is dummy_encoder_inputs = [np.array([data_utils.PADID]) for in range(len(encoder_inputs))].
in tf_chatbot_seq2seq_antilm/lib/seq2seq_model_utils.py.
This is presumably the probability of the target (P(T)) from the paper https://arxiv.org/pdf/1510.03055.pdf, but how does feeding in an encoder input sequence of PAD give you the probability of T?
Anyone have any ideas?
Cheers,
Kuhan