chaitanyamalaviya / ExpertQA

[Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers
MIT License
118 stars 13 forks source link
attribution expert factuality

drawing ExpertQA drawing: Expert-Curated Questions and Attributed Answers

Paper

Find the paper at https://arxiv.org/abs/2309.07852

Dataset

ExpertQA contains 2177 examples, which are validated on various axes of factuality and attribution. The main data can be found at

This can be loaded simply using the data loaders at data_utils as:

data = example_utils.read_examples("data/r2_compiled_anon.jsonl")

The file contains newline-separated json dictionaries with the following fields:

Each Answer object contains the following fields:

Each Claim object contains the following fields:

Additional Files

Long-form QA

The random and domain split for the long-form QA dataset can be found at data/lfqa/. The files for the random split are prefixed with rand_lfqa_ and the files for the domain split are prefixed with domain_lfqa_.

Modeling

Response collection

Found at modeling/response_collection. The scripts for collecting responses from different systems are at:

Attribution estimation

Found at modeling/auto_attribution.

Factuality estimation

Found at modeling/fact_score. See sample usage at get_fact_score.sh.

Long-form QA

Found at modeling/lfqa. Example usages at bash_scripts/run_lfqa.sh.

Evaluation

Scripts and documentation for running evaluation are in the eval/ directory.

License

This project is licensed under the MIT License - see the LICENSE file for details

Citation

@inproceedings{malaviya2024expertqa,
title={Expert{QA}: Expert-Curated Questions and Attributed Answers},
author={Chaitanya Malaviya and Subin Lee and Sihao Chen and Elizabeth Sieber and Mark Yatskar and Dan Roth},
booktitle={2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics},
year={2024},
url={https://openreview.net/forum?id=hhC3nTgfOv}
}