BunsenFeng / FactKB

Code for "FactKB: Generalizable Factuality Evaluation using Language Models Enhanced with Factual Knowledge". EMNLP 2023.
18 stars 0 forks source link

Recommend practices for long context evaluation #1

Open Leonard907 opened 8 months ago

Leonard907 commented 8 months ago

Hi, I came across FactKB in a project to evaluate summarization and found it useful for my study. I tried to run the code and found that the pretrained model only accepts max 512 tokens which seems to be the combination for both summary and article. For my case my summary length is around 500 tokens and input is 4K tokens. I wonder if there are any recommendations to run factKB for my case, as I saw some previous papers doing sentence-to-sentence approaches for evaluation (e.g. section 3.2 of Measuring Faithfulness of Abstractive Summaries. I would appreciate any advice on whether using a sentence-to-sentence approach makes sense for factKB, or if there are any other methods that you found helpful. Thanks!

BunsenFeng commented 8 months ago

Thank you for your interest in our work! Yes, FactKB only supports 512 tokens as it is based on RoBERTa. I guess a quick patch for long documents would be to apply the FactKB synthetic pretraining methodology on LongFormer, or other masked LMs that support longer context. Alternatively, you could check out LLM-based evaluation for summarization (https://arxiv.org/abs/2209.12356) as GPT-4 evaluation of machine-generated text seems to be quite common now.

The sentence-to-sentence evaluation protocol could work too. Let's assume summarization locality: we could first partition the document and summary into multiple chunks/sentences/whatever granulaity that works for you. If we assume that: 1) each chunk in the summary are independent of each other, and 2) each chunk in the summary is uniquely supported by one document chunk, then we could employ FactKB on (doc chunk, sum chunk) pairs, compute the maximum across doc chunks and average across sum chunks, for an overall metric. That's a quick idea I had.

Leonard907 commented 8 months ago

Thank you for the advice! I will try this out