facebookresearch / unlikelihood_training

Neural Text Generation with Unlikelihood Training
Other
310 stars 45 forks source link

Need help in understanding how the negative candidates are chosen #7

Open wasiahmad opened 3 years ago

wasiahmad commented 3 years ago

Hi, I am trying to understand the following code snippet.

https://github.com/facebookresearch/unlikelihood_training/blob/723747171a3fa909cda68df399e39f0a3e5067d9/custom/candidate_penalty_ce_loss.py#L50-L59

If my understanding is correct, ctx_cands is a square matrix where each dimension is of size batch_size x sequence_len after the following statement.

ctx_cands = target.unsqueeze(0).expand(target.size(0), target.size(0))

If I assume, self.padding_idx=0, what is the point of the following two statements.

ctx_cands_ = (ctx_cands.tril(-1) + self.padding_idx) 
ctx_cands_ = ctx_cands_ * ctx_cands_.triu() 

Because after the above two statements, ctx_cands_ will be a zero tensor. Isn't it?

Can you please explain how the lines of code pick the previous context tokens as negative candidates?

uralik commented 3 years ago

hello, the code assumes that the padding index = 1, please see the related issue here as well: https://github.com/facebookresearch/unlikelihood_training/issues/3

The idea here is to make ctx_cands of size [num_target_tokens, num_target_tokens] such that the row i contains the previous context token indices and pad token index.

Then the lprobs mask of size [num_predicted_tokens, vocab_size] is created using scatter operation: negative_targets = torch.zeros_like(lprobs).scatter_(1, ctx_cands, 1)

here for every time step position we have full vocab vector where we assign 1 to every token index which needs to be penalized (i.e. negative candidate).

Later this mask is used in order to obtain the final UL loss here: custom_loss = -torch.log(one_minus_probs)*negative_targets

Please feel free to ask further questions if you have any!