Some confusion for baseline

dki-lab / GrailQA

Apache License 2.0

102 stars 17 forks source link

Hi,

In the inference stage of the ranking-based model, why use the average/sum probability of each token of the candidate s-expression to select the s-expression that best matches the corresponding question? (I don’t actually understand this method). some code for the above question:

log_probs = F.log_softmax(logits, dim=-1) log_probs = log_probs.reshape(batch_size, num_of_candidates, num_decoding_steps, -1) targets.masked_fill_(targets == -1, 0) log_probs_gather = log_probs.gather(dim=-1, index=targets.unsqueeze(-1)) log_probs_gather = log_probs_gather.squeeze(-1)
In addition, why the generative model and the ranking model use the same model? and why not use BERT-based model for the binary classification task to do the matching task for the question and the candidate s-expressions?
It's make-sense that using static vocabulary to learn knowledge, but the baseline model uses a dynamically variable vocabulary. What I wonder is why this approach can also well learn the related knowledge on GrailQA.

PS: I have submitted the test set prediction file on codalab, and send a email to you~

Best.

Hi,

Our model is trained using the cross entropy loss to maximize the likelihood of gold queries, so we can use the confidence of our model for a candidate query as a scoring function to determine how likely the candidate query is correct.
Good question! What you suggested is also a design choice, actually there are some papers on KBQA using BERT in a similar way as you mentioned (i.e., QGG[lan et al.]). In our experiments, we mainly want to use the ranking model to serve as a reasonable baseline and to show the importance of effective search space pruning through the comparison of different baseline models.
One way to understand it is that we still have a static vocabulary and a global representation for each item in the vocabulary, the only difference is that for each input question, we will pass a dynamic mask to mask out those irrelevant vocab items, which does not actually change the learning process in essence.

dki-lab / GrailQA