dhimmel / learn

Machine learning and feature extraction for the Rephetio project
https://doi.org/10.15363/thinklab.d210
4 stars 5 forks source link

Can't find the Symptomatic validation dataset #7

Closed lingling93 closed 4 years ago

lingling93 commented 5 years ago

Hi Daniel : I want evaluate my model exactly as you did with four validation datasets. Three of them(Disease Modifying, Clinical Trial, Drugcentral) are easy to get from the validation-datasets in your repository. I have not found the Symptomatic dataset yet. Can you help me with that ? Thank you! Lingling

dhimmel commented 5 years ago

Hi @lingling93, it looks like we compute the relevant performance visualizations and measures in the prediction/6-vizr.ipynb R notebook. This notebook reads probabilities.tsv, which contains a category column. The positives for symptomatic indications are any pairs here with SYM for the category column. I believe the negatives are anything where category is blank (i.e. excluding the disease modifying DM indications).

The symptomatic and disease modifying indications come from PharmacotherapyDB 1.0. More info on PharmacotherapyDB is available here.

dhimmel commented 4 years ago

See https://github.com/dhimmel/learn/issues/9#issuecomment-594785230 for more information on computing the "Symptomatic" indication set of positives and negatives.