Bug in Data Splitting on FewRel Dataset

declare-lab / RelationPrompt

This repository implements our ACL Findings 2022 research paper RelationPrompt: Leveraging Prompts to Generate Synthetic Data for Zero-Shot Relation Triplet Extraction. The goal of Zero-Shot Relation Triplet Extraction (ZeroRTE) is to extract relation triplets of the format (head entity, tail entity, relation), despite not having annotated data for the test relation labels.

MIT License

122 stars 16 forks source link

Bug in Data Splitting on FewRel Dataset #9

Closed SaeedNajafi closed 1 year ago

SaeedNajafi commented 1 year ago

Hey, The fewrel dataset has 700 sentences per relation id.

After splitting the FewRel into train/dev/test, you should get 10500 sentences in the test split as you have 15 unseen relation ids.

Using your code, we get fewer sentences on the splits. I tested with seed 12321, and there are 200 sentences missing on the test split.

Please fix this issue and re-evaluate the results for the main paper.

chiayewken commented 1 year ago

Hi, the reason for fewer samples is that some samples have the same text, hence they are merged to form the multi-triplet sentences.

SaeedNajafi commented 1 year ago

The multi-triplet sentences are in the data, but for prediction, it is important to use a multi-eval mode on sentences with multiple triplets.