Open leonhardhennig opened 2 years ago
One possible approach is to write one or several prompts that capture the relation and pre-label the dataset with those. Example: How much revenue did $entity_company$ make? [input sentence] Answer span: $entity_revenue$ in input sentence. If the dataset is in TacRED format, I can do this without much effort. For examples where one entity is missing we need a strategy. (E.g questions without entity or one entity with yes or no answers)
Think of a way to distantly / heuristically label sentences in the Businesswire dataset (after NER validation by Crowdee, not yet done) with the following relation types:
Maybe train DISTRE on GIDS and then do predictions on BW dataset to add GIDS relation types?
@harbecke suggested to try to use prompts to pre-label BW docs?