DFKI-NLP / sherlock

State-of-the-art Information Extraction
3 stars 1 forks source link

Distantly supervised labels for BW dataset #55

Open leonhardhennig opened 2 years ago

leonhardhennig commented 2 years ago

Think of a way to distantly / heuristically label sentences in the Businesswire dataset (after NER validation by Crowdee, not yet done) with the following relation types:

Maybe train DISTRE on GIDS and then do predictions on BW dataset to add GIDS relation types?

@harbecke suggested to try to use prompts to pre-label BW docs?

harbecke commented 2 years ago

One possible approach is to write one or several prompts that capture the relation and pre-label the dataset with those. Example: How much revenue did $entity_company$ make? [input sentence] Answer span: $entity_revenue$ in input sentence. If the dataset is in TacRED format, I can do this without much effort. For examples where one entity is missing we need a strategy. (E.g questions without entity or one entity with yes or no answers)