UKPLab / acl2023-argscichat

Apache License 2.0
7 stars 1 forks source link

How is Rationale-F1 calculated? #2

Open desmond1687 opened 1 week ago

desmond1687 commented 1 week ago

The paper mentions, "we compute the F1 score over candidate sentences in a scientific paper against the reference rationales. We denote this metric as Rationale-F1." So, I believe the authors mean the sentence-level F1 score. However, I found that in the data from "argscichat_allennlp/argscichat_train_dev," the sentence boundaries in "content" and "facts" do not match.

For example, in fold_0_test.json, there is a fact: "We run a battery of supervised machine learning models for automatically detecting parody tweets," but the corresponding sentence in "content" is "We run a battery of supervised machine learning models for automatically detecting parody tweets with an emphasis on robustness by testing on tweets from accounts unseen in training, across different genders and across countries."

So, why is there this inconsistency? If that's the case, how are the TF-IDF and S-BERT baselines mentioned in the paper implemented? Are their inputs the sentences from "content"? If so, it would be impossible to correctly calculate Rationale-F1 with the ground truth sentences provided in "facts," because the sentence boundaries are different.

I would greatly appreciate it if you could answer my questions above! Thanks!

federicoruggeri commented 1 week ago

Hi Desmond! Apologies for the late reply and code unclarity. Yes, we compute sentence-level F1 score.

The point is that selected facts are text snippets that may span one or more sentences. To address this issue, we split each fact and content into sentences and perform sentence matching between fact and content. This is needed for computing baselines and when training models with fact supervision.

You can check validate_facts function in argscichat_allennlp/scripts/fact_selection_baselines.py and _article_to_instances method in argscichat_allennlp/argscichat_baselines/dataset_reader.py.

I hope my reply answers to your doubts!