amazon-science / context-situated-pun-generation

This repository provides the dataset used in "Context-situated pun generation" by Jiao Sun, Anjali Narayan-Chen, Shereen Oraby, Shuyang Gao, Tagyoung Chung, Jing Huang, Yang Liu, and Nanyun Peng.
Other
7 stars 0 forks source link

Context-Situated Pun Generation

Overview

This repository includes the collected dataset from "Context-Situated Pun Generation" appearing at EMNLP 2022 (paper available on amazon.science or arXiv).

The original SemEval 2017 Task 7 dataset (Miller et al., 2017) contains puns that are either homographic (exploiting polysemy) or heterographic (exploiting phonological similarity to another word). We sample puns that contain both sense annotations and pun word annotations from SemEval Task 7. From this set, we sample from the 500 most frequent pun word/alter word pairs (pw, aw) and randomly sample 100 unique context words C. Combining the sampled pun pairs and context words, we collect 4,552 (C, pw, aw) instances for annotation. Full details on the data collection can be found in the paper (see Citation section).

Sample Instance

The excerpt below shows a sample data instance:

context         pun_word    alter_word  pun_word_sense                                                                                          alter_word_sense                                                                            new_pun     user_pun
25 cent,profit  charge      charge  pay with a credit card; pay with plastic money; postpone payment by recording a purchase as a debt  energize a battery by passing a current through it in the direction opposite to discharge   yes     The cashier said there was no charge for my battery.

Description of Fields

Data File

In this repository, we release the full dataset of 4,552 annotated instances in the Context-SitUated Pun (CUP) dataset.

├── data
   └── context_situated_pun.csv (full dataset)

Security

See CONTRIBUTING for more information.

License

This library is licensed under the CC-BY-NC-4.0 License (see LICENSE).

Citation

If using this dataset in any relevant work, please cite the following papers: