RTXteam / RTX-KG2

Build system for the RTX-KG2 biomedical knowledge graph, part of the ARAX reasoning system (https://github.com/RTXTeam/RTX)
MIT License
37 stars 8 forks source link

PMIDs in KG2 #200

Open acevedol opened 2 years ago

acevedol commented 2 years ago

From AHM on 4/27, @edeutsch asked for information on PMIDs. How many of our sources are extracting PMID data and are they making it into KG2 edges?

I am looking into it, but also want to check with @saramsey if he might know.

amykglen commented 2 years ago

I ran a query in KG2c to get an idea of how many non-semmeddb sources we get PMIDs from:

match (n)-[e]->(m) where n.publications is not null and not "infores:semmeddb" in e.knowledge_source return distinct e.knowledge_source, count(distinct e) order by count(distinct e) desc
e.knowledge_source count(distinct e)
["infores:pathwhiz"] 3975444
["infores:drugbank"] 2253707
["infores:ensembl-gene"] 1331738
["infores:hmdb"] 713751
["infores:hmdb", "infores:pathwhiz"] 682382
["infores:diseases"] 594581
["infores:intact"] 267449
["infores:disgenet"] 252930
["infores:goa"] 215042
["infores:pr"] 143262
["infores:ncit"] 119654
["infores:umls-metathesaurus"] 119355
["infores:reactome"] 86940
["infores:mesh"] 85789
["infores:chebi"] 66488
["infores:omim"] 64684
["infores:kegg"] 49562
["infores:diseases", "infores:disgenet"] 44476
["infores:dgidb"] 42807
["infores:uniprot"] 33991
["infores:chembl"] 30207
["infores:drugcentral"] 26789
["infores:go", "infores:go-plus"] 23698
["infores:go"] 23439
["infores:loinc-umls"] 22087
["infores:ncbi-gene", "infores:ensembl-gene", "infores:pr", "infores:uniprot"] 19511
["infores:chebi", "infores:go-plus"] 12105
["infores:go-plus"] 11756
["infores:ordo"] 10864
["infores:fma-umls"] 9626
["infores:disease-ontology"] 9163
["infores:uberon"] 9093
["infores:mondo"] 9020
["infores:mondo", "infores:efo"] 8762
["infores:rxnorm"] 7302
["infores:drugbank", "infores:drugcentral"] 6603
["infores:efo"] 6195
["infores:hpo"] 5075
["infores:fma-obo", "infores:fma-umls"] 4192
["infores:atc-codes-umls"] 2913
acevedol commented 2 years ago

Thank you, Amy!

edeutsch commented 2 years ago

Great! I'm surprised! I just did a query for pathways and got this: https://arax.ncats.io/?r=40441 and see publications associated with a non-SemMedDB edge, so this is great! image

saramsey commented 2 years ago

Wow, that was more than I expected. Thank you for the analysis, @amykglen