Hi,
In README, you mention that this is the recommended version to reproduce the results in the QuestEval paper.
I want to make sure that this code is doing exactly what the paper describes.
Would this base_score function evaluate as described in the paper? (Equation 1 and 2, F1 for accuracy, answerability for reproducibility).
This code seems to take the average of F1 and answerability.
Hi, In README, you mention that this is the recommended version to reproduce the results in the QuestEval paper.
I want to make sure that this code is doing exactly what the paper describes. Would this
base_score
function evaluate as described in the paper? (Equation 1 and 2, F1 for accuracy, answerability for reproducibility). This code seems to take the average of F1 and answerability.https://github.com/ThomasScialom/QuestEval/blob/7c827804a8da82560e91cf8fe84124d37f7c0660/questeval/questeval_metric.py#L353