Open lucas-ubm opened 3 years ago
I am confused. I would expect that for the Homo Sapiens gene product database there only is one source for the mapping, and that that is indeed ENSEMBL (although provenance should contain data and version). What other data sources does the webservice show then?
It returns a list with information regarding all data sources for the given organism (since the input for the call is only the organism). In the case of Homo sapiens:
DATASOURCENAME Ensembl BUILDDATE 20180509 SERIES Homo sapiens genes and proteins DATATYPE GeneProduct DATASOURCEVERSION 91 SCHEMAVERSION 3 DATASOURCENAME HMDB-CHEBI-WIKIDATA BUILDDATE 20201104 DATATYPE Metabolite SERIES standard_metabolite DATASOURCEVERSION HMDB4.0.20190116-CHEBI193-WIKIDATA20201104 SCHEMAVERSION 3 DATASOURCENAME EBI-RHEA BUILDDATE 20190522 SERIES standard-interaction DATATYPE Interaction DATASOURCEVERSION 1.0.0 SCHEMAVERSION 3 DATASOURCENAME Wikidata BUILDDATE 20200527 SERIES humancorona DATATYPE GeneProduct DATASOURCEVERSION 1.0.0 SCHEMAVERSION 3 DATASOURCENAME Wikidata BUILDDATE 20200510 SERIES complexes DATATYPE Complex DATASOURCEVERSION 1.0.0 SCHEMAVERSION 3 DATASOURCENAME Wikidata BUILDDATE 20200510 SERIES publications DATATYPE Article DATASOURCEVERSION 1.0.0 SCHEMAVERSION 3
That is actually a bit weird, right. It seems to return information from other loaded databases, at least for reactions and metabolites. I think: 1) You should be able to query what databases are loaded (Can you?) 2) For each of these ask for the provenance. 3) When a new database is loaded it should somehow get from the database what relevant provenance it has and make that available for 2 (I could imagine that future databases will have different types of prpovenence)
This is what the webservice returns if you ask for the properties of homo sapiens (so calling https://webservice.bridgedb.org/Human/properties) (the closest thing we currently have to provenance)
Yes, that is what I meant. It seems to also return information about the metabolite database and the reaction database when you ask about the human geneproduct database. That must be confusing.
When using the
getPropeties()
method to obtain provenance information about a given mapper object only provenance information on one data source is provided. For example, in the case of a 'Homo sapiens' mapper I obtain provenance information on the 'Ensembl' data source. However, when using BridgeDb's webservices we obtain a list containing the provenance information of all data sources for a given organism.I wonder if this is the intended behavior or the R package should display the information for all data sources.
Code to replicate: