cylnlp / dialogsum

DialogSum: A Real-life Scenario Dialogue Summarization Dataset - Findings of ACL 2021
170 stars 40 forks source link
dataset dialogue nlp summarization

DialogSum: A Real-life Scenario Dialogue Summarization Dataset

DialogSum is a large-scale dialogue summarization dataset, consisting of 13,460 dialogues with corresponding manually labeled summaries and topics. You can directly download the data from this fold, or from the Hugging face Dataset.

This work is accepted by ACL findings 2021. You may find the paper here. If you use our dataset, please kindly cite our paper:

@inproceedings{chen-etal-2021-dialogsum,
    title = "{D}ialog{S}um: {A} Real-Life Scenario Dialogue Summarization Dataset",
    author = "Chen, Yulong  and
      Liu, Yang  and
      Chen, Liang  and
      Zhang, Yue",
    booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.findings-acl.449",
    doi = "10.18653/v1/2021.findings-acl.449",
    pages = "5062--5074",
}

DialogSum Challenge is a shared task at INLG 2022. Check the task website and shared task report.

This dataset is under CC BY-NC-SA 4.0 license. You may not use it for commercial use without permission.

Quick Start

We provide a BART baseline for dialogue summarization, which may help you explore this area as a quick start. See Baseline.

Dialogue Data

We collect dialogue data for DialogSum from three public dialogue corpora, namely Dailydialog (Li et al., 2017), DREAM (Sun et al., 2019) and MuTual (Cui et al., 2019), as well as an English speaking practice website. These datasets contain face-to-face spoken dialogues that cover a wide range of daily-life topics, including schooling, work, medication, shopping, leisure, travel. Most conversations take place between friends, colleagues, and between service providers and customers.

Compared with previous datasets, dialogues from DialogSum have distinct characteristics:

Summaries

We ask annotators to summarize each dialogue based on the following criteria:

Topics

In addition to summaries, we also ask annotators to write a short topic for each dialogue, which can be potentially useful for future work, e.g. generating summaries by leveraging topic information.

Reference