WormBase / ACKnowledge

Author Curation to Knowledgebases
MIT License
1 stars 1 forks source link

Create a black list of genes with a high probability of being false positives #140

Closed valearna closed 4 years ago

valearna commented 4 years ago

From email conversation with @vanaukenk:

We could use the ratio of curated references in WB to TextpressoCentral hits using the gene name as a keyword search as an indication of the probability of being a false positive. The higher the ration the lower the probability.

valearna commented 4 years ago

We also need a manual blacklist for other entities such as CB2 strain

valearna commented 4 years ago

We decided to try to implement TFIDF based threshold and to compare the results with manual blacklists: https://docs.google.com/spreadsheets/d/1hpo3DCIcOX20mrLNOQk4KdU3Bal7bJJGIdiJSsRqbsw

draciti commented 4 years ago

closing as we are using TFIDF, if need be to add manual blacklists we will comment them in