bio2bel / scai-mirna-corpra

SCAI miRNA text mining corpra to BEL
MIT License
0 stars 0 forks source link

Discuss strategy for dealing with the corpora #1

Open mali-git opened 6 years ago

mali-git commented 6 years ago

There exists a training and a test corpora. And each corpus contains miRNA-disease and miRNA-genes interactions.

Current approach: Create for each corpus a DB and include all pairs. As a consequence, the DB model must be more generic and and the implementation of add_to_bel_graph() needs to be changed.

Alternative approach: Create two DBs one for miRNA-disease, and one for miRNA-genes pairs independet from which source (train or test) the pairs come. add_to_bel_graph() must be adapted since a pair doesn't need to describe a relation. Nevertheless, negative pairs are interesting for machine learning use cases.

cthoyt commented 6 years ago

make one database that's general enough to encompass all of the information. The part that converts it to BEL can have more complex logic that the HMDD one - it would be reasonable to include a tag that says what's a negative example, and those can be filtered out when serializing to BEL