Closed m0baxter closed 1 year ago
Hi @m0baxter, Yes. What we did is not the harmonic mean of P and R, but rather the average of the two. In our experiments, the latter performs better. Thanks for pointing this out! We will add a note in the updated paper.
The formula used here:
https://github.com/AIPHES/DiscoScore/blob/4f2c5934eea8f3ea443e4a133d54277d6e32e23a/disco_score/metrics/discourse.py#L143
for the F-score version of focus score is incorrect. It should be
I know this isn't explicitly used here but it is referenced in the paper.