API endpoint for protein query

Rothamsted / knetminer

KnetMiner - webapp to search and visualize genome-scale knowledge graphs

https://knetminer.com

MIT License

25 stars 16 forks source link

API endpoint for protein query #438

Open KeywanHP opened 4 years ago

KeywanHP commented 4 years ago

From Clay Birkett:

I have a request for the next release of KnetMiner. Is there an api to query by protein? I have been using the BioMart tool to download protein annotation because many of the wheat genes have not been annotated or located. For example, it would be helpful to search by UniPort TrEMBL or SwissProt ID. One limitation is that protein identifiers change frequently and not at the same time as the gene identifiers.

marco-brandizi commented 4 years ago

This is too specific to be good in KNM. We would need an API to perform queries like: conceptClass=X, attribute A=V, or: node1 -> rel -> node2 -> ..., RETURN nodex, rely

Of course, we already have that in the form of Cypher/SPARQL queries, and we're finalising them. Possibly, we might additionally want more tailored APIs, like:

Find a node or relation by type/attribute (as outline above). Possibly, support this via GraphQL (I'm changing my mind on GraphQL after having read this)
Supporting Cypher via REST wrapper (if we can't manage to publish the BOLT endpoint in read-only mode)
Supporting Gremlin via REST wrapper (OpenCypher could run on top of it, Neo4j could be the underlining database)

This should be interesting for @AjitPS and @josephhearnshaw too.

ClayBirkett commented 4 years ago

I see how searching for a protein id is too specific. Is there a way to search by gene name? I am looking for something more specific then a keyword search but not using a gene id because that is too specific. I am using the gene name from UniProt even though in many cases these are not mapped to a gene id.

KeywanHP commented 4 years ago

The wheat KG genes contain names from Ensembl and inferred names from Arabidopsis orthologs. You can search by gene name instead of accessions, for example:

https://knetminer.rothamsted.ac.uk/wheatknet/genepage?list=WRKY43&keyword=dormancy%20OR%20germination

https://knetminer.rothamsted.ac.uk/wheatknet/genepage?list=WRKY43

Is that helpful?

marco-brandizi commented 4 years ago

Hi @ClayBirkett,

adding simple node/relation search can be done in the short/medium time. For more complex queries, we already have SPARQL or Cypher, I'd suggest to give a look and see if that might help you.

SPARQL is provisionally here (we have to finalise a better DNS and add documentation, but this works).

Cypher/Neo4j have to be put on line yet (problems with Firewall), but we can give you dumps.

Both are based on our application schema.

ClayBirkett commented 4 years ago

That works sometimes. As you mentioned, the problem is that you are using Arabidopsis orthologs and we are using Uniprot Wheat so the gene names are not always going to match. At some point, the wheat genome will have annotation that is good enough to stop using Arabidopsis.