Removing test set associations from abstracts prior to training

bio-ontology-research-group / multi-drug-embedding

Method for drug repurposing from knowledge graphs and literature

33 stars 12 forks source link

Removing test set associations from abstracts prior to training #1

Closed sgfin closed 6 years ago

sgfin commented 6 years ago

Hi there,

Nice work! I had one question about your methods that is unclear from the paper:

You describe how you removed known associations from your knowledge graph prior to generating the KG embeddings, presumably to avoid test set leakage. Did you make any attempt to remove sentences explicitly describing the known relationships from the abstracts before training the classic word embeddings? If so, how did you do this?

Thanks, Sam

monaalsh commented 6 years ago

Hi, We did not remove any direct occurrences from the literature, and this may be used for the prediction. There will likely be some parts which rely only on relation extraction from text, while others are more indirect and rely on the background knowledge in the graph. Thank you for pointing this out, removing these will be a great extension.

sgfin commented 6 years ago

Thank you!