biothings / biothings_explorer_archived

BioThings Explorer: a schema-based client for API interoperability
Apache License 2.0
14 stars 14 forks source link

API issues discussed 8/24 #117

Open colleenXu opened 4 years ago

colleenXu commented 4 years ago

The following issues are all demonstrated (with verbose log) in the notebook here: https://github.com/colleenXu/biothings_explorer/blob/CX_WIPs_Here/jupyter%20notebooks/CX_WIPs/CX_MapleSyrupUrineDisease.ipynb and the matching html file.

Notes:

1. SEMMED Disease API, wrong data

When I was querying BioThings Explorer (Disease → Gene), I found some erroneous information returned by the SEMMED Disease API. These are examples of erroneous information (wrong gene or no gene mentioned in paper):

BCKD Deficiency ‘caused_by’ SNW1 (id: NCBIGene:22938) with pubmed 8312380 BCKD Deficiency ‘treated_by’ ARID4B (id: NCBIGene:51742) with pubmed 27334242 BCKD Deficiency ‘related_to’ ARID4B (id: NCBIGene:51742) with pubmed 10472531,16628687 BCKD Deficiency ‘treated_by’ S100A9 (id: NCBIGene:6280) with pubmed 26205848

2. SEMMED Disease API, unclear entity type

When I was querying BioThings Explorer (Disease → Gene), I found some results with ULMS IDs returned by the SEMMED Disease API. They appeared to be 'Gene' type objects. However, the Hint module resolves them to be ChemicalSubstance. In addition, these ULMS terms seem more like descriptions of enzymes rather than specific Protein or Gene entities. Example results:

BCKD Deficiency ‘disrupted_by’ C0026741 with pubmed 20646061 BCKD Deficiency ‘affected_by’ C0026741 with pubmed 9546032 BCKD Deficiency ‘related_to’ C0026741 with pubmed 10745006 BCKD Deficiency ‘caused_by’ C0026741 with pubmed 25333063

Result from hint module when querying for ‘C0026741’: display': 'MESH(D009097) UMLS(C0026741) name(Complexes, Multienzyme)','type': 'ChemicalSubstance'

3. SEMMED Disease API, unable to resolve ULMS ID for ChemicalSubstance entities

The hint module doesn’t find results when querying for some ULMS terms. I got these ULMS terms from a Disease → ChemicalSubstance query with BioThings Explorer.

Example result: BCKD Deficiency ‘caused_by’ C0626053 (id: UMLS:C0626053) with pubmed 16365091. However, there are no results returned by the hint module for the id ‘C0626053’.

4. API calls failing, likely due to many/concurrent calls

5. API calls failing for unknown reason

6. TypeError thrown when trying to do PhenotypicFeature → Pathway

Querying from PhenotypicFeature → Pathway (as part of the path Disease → PhenotypicFeature → Pathway) did not work. The BioLink API was identified but then a TypeError is thrown. The last part of the error message is: ~/miniconda3/envs/BTE/lib/python3.7/site-packages/biothings_explorer/apicall.py in call_one_arbitrary_api(self, _input, session, verbose) 58 # if(path_value_template): 59 base_url = base_url.replace( ---> 60 "{" + path_param + "}", path_value_template 61 ).replace("{inputs[0]}", _input["value"]) 62 parameters.pop(path_param)

TypeError: replace() argument 2 must be str, not None

7. Myvariant.info variant in wrong gene

Querying from SequenceVariant → Gene, BioThings Explorer used myvariant.info API and returned: SequenceVariant DBSNP:rs398123494 ‘located_in’ Gene EXOSC5 (NCBIGene:56915). However, this SNP is actually located in BCKDHA (NCBIGene:593) and upstream of EXOSC5; look at any genome browser to confirm. It does seem to be associated with both genes (https://www.snpedia.com/index.php/Rs398123494).

8. Monarch/BioLink API returns phenotypes for child diseases

In a Disease→ PhenotypicFeature query, Monarch/BioLink API returns phenotypes for the desired disease (MONDO:0009563) and one of its children (MONDO:0009529). Perhaps the default behavior can be changed so that only directly annotated phenotypes are returned? Note that the Monarch website also includes phenotypes for both diseases under one disease (https://monarchinitiative.org/disease/MONDO:0009563#phenotype).

API call: https://api.monarchinitiative.org/api/bioentity/disease/MONDO:0009563/phenotypes?rows=200

9. mydisease.info and "qualifier" HP terms

Mydisease.info returns an phenotype term that seems more like a qualifier for other phenotypes: INTERMITTENT (HP:0031796) https://hpo.jax.org/app/browse/term/HP:0031796. Is this desired behavior, or should this qualifier be handled differently?

Likely API call: https://mydisease.info/v1/query?fields=hpo.phenotype_related_to_disease (POST -d q=248600&scopes=hpo.omim)

10. Comment on phenotypic features annotated to the input disease

Given the name of this disease (maple syrup urine disease), it’s odd that I don’t notice the urine smell phenotype (sweet-smelling urine) returned in a Disease→PhenotypicFeature query by either knowledge source (BioLink API / Monarch or mydisease.info). Maybe the term for it would be https://hpo.jax.org/app/browse/term/HP:0012088 .