Natural Questions (NQ) contains real user questions issued to Google search, and answers found from Wikipedia by annotators. NQ is designed for the training and evaluation of automatic question answering systems.
Hello, in 'nq_eval.py' it is mentioned that "Each prediction should be provided with a long answer score, and a short answers score".
May I clarify what these scores refer to? Are these scores supposed to represent the confidence of the model's predictions, or is there a fixed method to obtain scores?
For example, can we define the 'score' to simply be the sum of the start and end logits of the prediction?
Lastly, are scores also required for null predictions?
Hello, in 'nq_eval.py' it is mentioned that "Each prediction should be provided with a long answer score, and a short answers score".
May I clarify what these scores refer to? Are these scores supposed to represent the confidence of the model's predictions, or is there a fixed method to obtain scores?
For example, can we define the 'score' to simply be the sum of the start and end logits of the prediction?
Lastly, are scores also required for null predictions?
Thank you very much!