This repository includes the collected dataset from "Context-Situated Pun Generation" appearing at EMNLP 2022 (paper available on amazon.science or arXiv).
The original SemEval 2017 Task 7 dataset (Miller et al., 2017) contains puns that are either homographic (exploiting polysemy) or heterographic (exploiting phonological similarity to another word). We sample puns that contain both sense annotations and pun word annotations from SemEval Task 7. From this set, we sample from the 500 most frequent pun word/alter word pairs (pw, aw) and randomly sample 100 unique context words C. Combining the sampled pun pairs and context words, we collect 4,552 (C, pw, aw) instances for annotation. Full details on the data collection can be found in the paper (see Citation section).
The excerpt below shows a sample data instance:
context pun_word alter_word pun_word_sense alter_word_sense new_pun user_pun
25 cent,profit charge charge pay with a credit card; pay with plastic money; postpone payment by recording a purchase as a debt energize a battery by passing a current through it in the direction opposite to discharge yes The cashier said there was no charge for my battery.
In this repository, we release the full dataset of 4,552 annotated instances in the Context-SitUated Pun (CUP) dataset.
├── data
└── context_situated_pun.csv (full dataset)
See CONTRIBUTING for more information.
This library is licensed under the CC-BY-NC-4.0 License (see LICENSE).
If using this dataset in any relevant work, please cite the following papers:
@inproceedings{sun2022context,
title = {Context-Situated Pun Generation},
author = {Sun, Jiao and Narayan-Chen, Anjali and Oraby, Shereen and Gao, Shuyang and Chung, Tagyoung and Huang, Jing and Liu, Yang and Peng, Nanyun},
booktitle = {Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
year = {2022}
}
@inproceedings{miller-etal-2017-semeval,
title = "{S}em{E}val-2017 Task 7: Detection and Interpretation of {E}nglish Puns",
author = "Miller, Tristan and
Hempelmann, Christian and
Gurevych, Iryna",
booktitle = "Proceedings of the 11th International Workshop on Semantic Evaluation ({S}em{E}val-2017)",
month = aug,
year = "2017",
address = "Vancouver, Canada",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/S17-2005",
doi = "10.18653/v1/S17-2005",
pages = "58--68",
abstract = "A pun is a form of wordplay in which a word suggests two or more meanings by exploiting polysemy, homonymy, or phonological similarity to another word, for an intended humorous or rhetorical effect. Though a recurrent and expected feature in many discourse types, puns stymie traditional approaches to computational lexical semantics because they violate their one-sense-per-context assumption. This paper describes the first competitive evaluation for the automatic detection, location, and interpretation of puns. We describe the motivation for these tasks, the evaluation methods, and the manually annotated data set. Finally, we present an overview and discussion of the participating systems{'} methodologies, resources, and results.",
}