carriex / lfqa_eval

ACL 2023 paper "A Critical Evaluation of Evaluations for Long-form Question Answering"
20 stars 1 forks source link

Question on QAFactEval evaluation setup #1

Closed jihyukkim-nlp closed 1 year ago

jihyukkim-nlp commented 1 year ago

Hello, Thank you for publishing the awesome work! I am interested in factuality evaluation, and I would appreciate sharing details regarding QAFactEval.

To my understanding, there are multiple evidence documents for a single long-form answer. However, in case of QAFactEval (and other summarization-oriented factuality metrics), only a single document is referred for a single summary (in our case the long-form answer).

I am curious about the methodology employed to obtain a single factuality score for each answer.

  1. Was an aggregation function, such as max pooling or mean pooling, applied on multiple scores obtained from the set of documents?

  2. Were the documents concatenated to form a unified evidence document?

Best regards, Jihyuk Kim

carriex commented 1 year ago

Hi @jihyukkim-nlp , thanks for your interest in our work! For QAFactEval, we treat the long-form answer as the "summary" and the concatenation of all evidence documents as the "source documents", hence no aggregation is needed.

We are working on releasing the code which includes implementation details of the metrics, sorry for the confusion!

jihyukkim-nlp commented 1 year ago

I appreciate your prompt response! I eagerly anticipate the forthcoming release of the codes :)

pribadihcr commented 1 year ago

Hi @jihyukkim-nlp , thanks for your interest in our work! For QAFactEval, we treat the long-form answer as the "summary" and the concatenation of all evidence documents as the "source documents", hence no aggregation is needed.

We are working on releasing the code which includes implementation details of the metrics, sorry for the confusion!

+1