Which is the version for the QuestEval paper?

Hi, In README, you mention that this is the recommended version to reproduce the results in the QuestEval paper.

I want to make sure that this code is doing exactly what the paper describes. Would this base_score function evaluate as described in the paper? (Equation 1 and 2, F1 for accuracy, answerability for reproducibility). This code seems to take the average of F1 and answerability.