declare-lab / RelationPrompt

This repository implements our ACL Findings 2022 research paper RelationPrompt: Leveraging Prompts to Generate Synthetic Data for Zero-Shot Relation Triplet Extraction. The goal of Zero-Shot Relation Triplet Extraction (ZeroRTE) is to extract relation triplets of the format (head entity, tail entity, relation), despite not having annotated data for the test relation labels.
MIT License
122 stars 16 forks source link

Creating own data splits #17

Closed jbrry closed 1 year ago

jbrry commented 1 year ago

Hi, thanks for the great resource!

I would like to try RelationPrompt on Wiki-ZSL and FewRel with different sizes of m.

In order to generate the new train/dev/test files, we can run the write_data_splits function, which calls the load_fewrel and load_wiki methods.

I have a few questions, is it possible to share what the contents of the file data/wiki_properties.csv should be or how to generate this file? Secondly, for the path_in parameter, is it safe to assume you used the FewRel and WikiZSL files linked in the ZSBERT README?

jbrry commented 1 year ago

I found the resource property_list.html in the ZS-BERT repo. You can convert it to data/wiki_properties.csvusing the code below:

import pandas as pd

dataframes = pd.read_html('property_list.html') # download from ZS-BERT
df = dataframes[0]

column_names = list(df.columns)

# rename columns to column headers used in RelationPrompt
# p: str
# pType: str
# pLabel: str
# pDescription: str
# pAltLabel: str

dfr = df.rename(columns={
    "ID": "p",
    "label": "pLabel",
    "description": "pDescription",
    "aliases": "pAltLabel",
    "Data type": "pType"
    })

dfr.to_csv("data/wiki_properties.csv", index=False)