Closed fmikaelian closed 5 years ago
Here are my takeways:
S = (1 - μ) * S_anserini + μ * S_bert
with μ=0.5
k=10
paragraphs.They modified BERT to compare predictions in a meaningful way (see #36):
to allow comparison and aggregation of results from different segments, we remove the final softmaxlayer over different answer spans.
But this is to be clarified.
What I understand from the quote below is that they only use the logits (score) for comparison between answers spans, instead of using the probabilities after applying the softmax function.
to allow comparison and aggregation of results from different segments, we remove the final softmaxlayer over different answer spans.
@fmikaelian what do you think?
Yes
It would be useful to cross check with the paper of danqi chen where she mentions something about it with their DrQA app https://cs.stanford.edu/~danqi/papers/thesis.pdf
Also we should follow this thread: https://github.com/huggingface/pytorch-pretrained-BERT/issues/360
In section 5.2.3 of Danqi Chen's thesis:
We apply our trained DOCUMENT READER for each single paragraph that appears inthe top 5 Wikipedia articles and it predicts an answer span with a confidence score. To make scores compatible across paragraphs in one or several retrieved documents, we use the unnormalized exponential and take argmax over all considered paragraph spans for our final prediction. This is just a very simple heuristic and there are better ways to aggregate evidence over different paragraphs
This part seems to be implemented here:
And here:
Our predict()
function does not currently returns this confidence score. How can we get it in our setup and modify it for comparision?
See: https://export.arxiv.org/pdf/1902.01718