Closed jihyukkim-nlp closed 1 year ago
Hi @jihyukkim-nlp , thanks for your interest in our work! For QAFactEval, we treat the long-form answer as the "summary" and the concatenation of all evidence documents as the "source documents", hence no aggregation is needed.
We are working on releasing the code which includes implementation details of the metrics, sorry for the confusion!
I appreciate your prompt response! I eagerly anticipate the forthcoming release of the codes :)
Hi @jihyukkim-nlp , thanks for your interest in our work! For QAFactEval, we treat the long-form answer as the "summary" and the concatenation of all evidence documents as the "source documents", hence no aggregation is needed.
We are working on releasing the code which includes implementation details of the metrics, sorry for the confusion!
+1
Hello, Thank you for publishing the awesome work! I am interested in factuality evaluation, and I would appreciate sharing details regarding QAFactEval.
To my understanding, there are multiple evidence documents for a single long-form answer. However, in case of QAFactEval (and other summarization-oriented factuality metrics), only a single document is referred for a single summary (in our case the long-form answer).
I am curious about the methodology employed to obtain a single factuality score for each answer.
Was an aggregation function, such as max pooling or mean pooling, applied on multiple scores obtained from the set of documents?
Were the documents concatenated to form a unified evidence document?
Best regards, Jihyuk Kim