Closed saramsey closed 6 years ago
@DeqingQu come see me when you have time, and we can discuss implementation of this feature.
For some node types, the description field information is already available as a sub-field of the JSON object in "extended_info_json" (protein, microRNA). For the four ontology types that we have (anatomic_feature, biological_process, phenotypic_feature, and disease) we may be able to get a one-paragraph description for each node, from the EBI Ontology Lookup Service (OLS) REST API: https://www.ebi.ac.uk/ols/docs/api
Also please see the following for a code example of RESTfully querying the EBI OLS: https://github.com/RTXteam/RTX/blob/master/code/reasoningtool/kg-construction/QueryEBIOLS.py
for microRNAs, it may be better to use the "comments" field from mirBase: http://www.mirbase.org/cgi-bin/mirna_entry.pl?acc=MI0000681
Anatomy (done) Source: EBIOLS https://www.ebi.ac.uk/ols/index Unknown rate: 22/916 = 2.4%
Phenotype (done) Source: EBIOLS https://www.ebi.ac.uk/ols/index Unknown rate: 2221/10713 = 20.7%
MicroRNA (done) Source: EBIOLS https://www.ebi.ac.uk/ols/index Unknown rate: 21/1695 = 1.2%
Pathway (done) Source: Reactome https://reactome.org/ContentService Unknown rate: 2/705 = 0.002%
Protein (done) Source: EBIOLS https://www.ebi.ac.uk/ols/index Unknown rate: 7008/19318 = 36.2%
Disease (done) Source (DOID:xxxx): EBIOLS https://www.ebi.ac.uk/ols/index Source (OMIM:xxxx): OMIM https://www.omim.org/ Unknown rate: 6472/12472 = 51.89%
Biomedical Process (done) Source: EBIOLS https://www.ebi.ac.uk/ols/index Unknown rate: 20/21139 = 0.001%
Chemical Substance (done) Source: MyChem http://mychem.info Unknown rate: 1108/2227 = 49.75%
@DeqingQu can you please look into this? Can we catch the 502 error in QueryOMIM.py? See line 67 here for example: https://github.com/RTXteam/RTX/blob/master/code/reasoningtool/kg-construction/QueryReactome.py
https://www.ebi.ac.uk/ols/api/ontologies/hp/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FHP_0006963
Status code 404 for url: https://www.ebi.ac.uk/ols/api/ontologies/hp/terms/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FHP_0006963
Status code 502 for URL: https://api.omim.org/api/entry?mimNumber=614747&include=text:description&format=json
Traceback (most recent call last):
File "UpdateNodesInfo.py", line 376, in
502 Error is fixed.
Hi Steve,
I fixed the 400 bug in QueryOMIMExtended.py. I don’t want to mess up the original QueryOMIM.py, so I copied it to a new file called QueryOMIMExtended.py. There are only two differences between QueryOMIMExtended.py and QueryOMIM.py. 1. 400 bug is fixed in QueryOMIMExtended.py. 2. The requests cache is used in QueryOMIMExtended.py and lru_cache is used in QueryOMIM.py.
The description field updating for chemical substance is done, but about 50% of the results are “UNKNOWN”.
I think it is fine to run UpdateNodesInfo.py now.
Best Regards, Deqing Qu
From latest KG (dated Friday 4/27), here are the statistics on completeness of the description fields for different node types (100% means every node has a description field that is not "unknown"):
node type | percent with description |
---|---|
anatomical_entity | 97.6% |
biological_process | 99.9% |
chemical_substance | 50.2% |
disease (OMIM:) | 42.3% |
disease (DOID) | 60.5% |
microRNA | 98.8% |
protein | 63.7% |
phenotypic_feature | 79.3% |
pathway | TBD |
dump of latest version of the KG (with the above descriptions) has been pushed to rtxkgdump.saramsey.org:
great! Why did it got a lot smaller?
this work will be done as a feature request for UpdateNodesInfo.py