Closed jonrkarr closed 4 years ago
Example:
https://datanator.info/search/trehalose/Saccharibacillus%20sacchari/
- Protein-Npi-phosphohistidine---trehalose phosphotransferase
Currently, this leads to a 404 error. We should either (a) create another way to encode these reactions into URLs or (b) exclude them from the search results.
I think we can solve this by taking advantage of the bound
parameter in endpoint by ignoring null
from the products
or substrates
array and setting bound
to loose
, the endpoint thus becomes: http://api.datanator.info/reactions/kinlaw_by_rxn/?substrates=HDTRYLNUVZCQOY-MFAKQEFJSA-N&products=LABSPYBHMPDTEL-LIZSDCNHSA-N&_from=0&size=10&bound=loose&dof=0&species=homo%20sapiens&taxon_distance=false&projection=%7B%27kegg_meta.gene_ortholog%27%3A%200%2C%20%27kegg_meta._id%27%3A%200%2C%20%27_id%27%3A%200%7D. Please let me know if you think this is a reasonable workaround.
I integrated this into the frontend. This seems to work well.
What does bound
do? It sounds like this enable a fuzzy search / partial match. When all of the reactants and products have InChI keys, will this always select the correct reaction?
I integrated this into the frontend. This seems to work well.
What does
bound
do? It sounds like this enable a fuzzy search / partial match. When all of the reactants and products have InChI keys, will this always select the correct reaction?
bound
decides if the search will return reactions with more reactants, in addition to user input. For instance, we have a reaction A + B + C -> D + E, user input for substrate is A and B, product is D.
When bound
is set to tight, this reaction won't be returned. When bound
is set to loose, the reaction will be returned.
When all the reactants and products have inchikeys and bound
is set to tight, it will always return the correct reaction, and only the correct reaction.
It sounds like this creates the opportunity to select a reaction different than the intended one.
E.g., if there were two reactions
when bound=loose
, can A->B
select A+C->B+D
?
When bound=loose, the API should
Even this has this problem. Consider two reactions that both have participants without InChiKeys A + B-noinchi -> C + D-noichi A + E-noinchi -> C + F-noichi
The URL for both reactions would be the same. As a result, there would be no way of always selecting the correction reaction.
I think we need a different scheme. One option is to hash (e.g., md5) the names of metabolites that don't have inchikeys. You can add a prefix to indicate that is not an InChI key or detect that is a string is an md5 hash on the basis of using lowercase letters and numbers (unlike InChI keys which are upper case letters and dashes).
It sounds like this creates the opportunity to select a reaction different than the intended one.
E.g., if there were two reactions
- A -> B
- A + C --> B + D
when
bound=loose
, canA->B
selectA+C->B+D
?When bound=loose, the API should
- Try to find an exact match
- If an exact match is not found, find a reaction which has a superset of the queried reaction participants.
Even this has this problem. Consider two reactions that both have participants without InChiKeys A + B-noinchi -> C + D-noichi A + E-noinchi -> C + F-noichi
The URL for both reactions would be the same. As a result, there would be no way of always selecting the correction reaction.
Resolution
I think we need a different scheme. One option is to hash (e.g., md5) the names of metabolites that don't have inchikeys. You can add a prefix to indicate that is not an InChI key or detect that is a string is an md5 hash on the basis of using lowercase letters and numbers (unlike InChI keys which are upper case letters and dashes).
At the moment bound = loose
does exactly what was suggested here. You are absolutely right that imprecision exists when reactants with no inchikeys are involved. I was just hoping such cases were rare so that no conflict like the last case described here would arise.
Hashing is a good idea. I'll do that.
The issue has been fixed. Protein-Npi-phosphohistidine---trehalose phosphotransferase does not lead to 404 anymore.
There's just one issue left. The frontend also needs to generate reaction URLs for the reactions related to genes. This either one of the following
Either is fine. I tried generating the hashes. What algorithm are you using?
Hash function is ripemd160. I'll add the additional info to the related_reactions endpoint.
Thanks. I think its safer to pass the information to the frontend and this will make it easier for you to change this if needed.
- The endpoint used to get the reactions related to an ortholog group (https://api.datanator.info/proteins/related/related_reactions) need to return the hashes of metabolites that don't have InChi keys.
I have included reaction_participant
object in the data returned.
I integrated this into the frontend.
Example:
Currently, this leads to a 404 error. We should either (a) create another way to encode these reactions into URLs or (b) exclude them from the search results.