KarrLab / datanator_rest_api

A OAS3 compliant REST API for the Datanator integrated database
MIT License
0 stars 3 forks source link

URLs for some reactions can't be formed because InChi key are missing for some participants; #91

Closed jonrkarr closed 4 years ago

jonrkarr commented 4 years ago

Example:

Currently, this leads to a 404 error. We should either (a) create another way to encode these reactions into URLs or (b) exclude them from the search results.

lzy7071 commented 4 years ago

Example:

Currently, this leads to a 404 error. We should either (a) create another way to encode these reactions into URLs or (b) exclude them from the search results.

I think we can solve this by taking advantage of the bound parameter in endpoint by ignoring null from the products or substrates array and setting bound to loose, the endpoint thus becomes: http://api.datanator.info/reactions/kinlaw_by_rxn/?substrates=HDTRYLNUVZCQOY-MFAKQEFJSA-N&products=LABSPYBHMPDTEL-LIZSDCNHSA-N&_from=0&size=10&bound=loose&dof=0&species=homo%20sapiens&taxon_distance=false&projection=%7B%27kegg_meta.gene_ortholog%27%3A%200%2C%20%27kegg_meta._id%27%3A%200%2C%20%27_id%27%3A%200%7D. Please let me know if you think this is a reasonable workaround.

jonrkarr commented 4 years ago

I integrated this into the frontend. This seems to work well.

What does bound do? It sounds like this enable a fuzzy search / partial match. When all of the reactants and products have InChI keys, will this always select the correct reaction?

lzy7071 commented 4 years ago

I integrated this into the frontend. This seems to work well.

What does bound do? It sounds like this enable a fuzzy search / partial match. When all of the reactants and products have InChI keys, will this always select the correct reaction?

bound decides if the search will return reactions with more reactants, in addition to user input. For instance, we have a reaction A + B + C -> D + E, user input for substrate is A and B, product is D. When bound is set to tight, this reaction won't be returned. When bound is set to loose, the reaction will be returned. When all the reactants and products have inchikeys and bound is set to tight, it will always return the correct reaction, and only the correct reaction.

jonrkarr commented 4 years ago

It sounds like this creates the opportunity to select a reaction different than the intended one.

E.g., if there were two reactions

when bound=loose, can A->B select A+C->B+D?

When bound=loose, the API should

  1. Try to find an exact match
  2. If an exact match is not found, find a reaction which has a superset of the queried reaction participants.

Even this has this problem. Consider two reactions that both have participants without InChiKeys A + B-noinchi -> C + D-noichi A + E-noinchi -> C + F-noichi

The URL for both reactions would be the same. As a result, there would be no way of always selecting the correction reaction.

Resolution

I think we need a different scheme. One option is to hash (e.g., md5) the names of metabolites that don't have inchikeys. You can add a prefix to indicate that is not an InChI key or detect that is a string is an md5 hash on the basis of using lowercase letters and numbers (unlike InChI keys which are upper case letters and dashes).

lzy7071 commented 4 years ago

It sounds like this creates the opportunity to select a reaction different than the intended one.

E.g., if there were two reactions

  • A -> B
  • A + C --> B + D

when bound=loose, can A->B select A+C->B+D?

When bound=loose, the API should

  1. Try to find an exact match
  2. If an exact match is not found, find a reaction which has a superset of the queried reaction participants.

Even this has this problem. Consider two reactions that both have participants without InChiKeys A + B-noinchi -> C + D-noichi A + E-noinchi -> C + F-noichi

The URL for both reactions would be the same. As a result, there would be no way of always selecting the correction reaction.

Resolution

I think we need a different scheme. One option is to hash (e.g., md5) the names of metabolites that don't have inchikeys. You can add a prefix to indicate that is not an InChI key or detect that is a string is an md5 hash on the basis of using lowercase letters and numbers (unlike InChI keys which are upper case letters and dashes).

At the moment bound = loose does exactly what was suggested here. You are absolutely right that imprecision exists when reactants with no inchikeys are involved. I was just hoping such cases were rare so that no conflict like the last case described here would arise.

Hashing is a good idea. I'll do that.

lzy7071 commented 4 years ago

The issue has been fixed. Protein-Npi-phosphohistidine---trehalose phosphotransferase does not lead to 404 anymore.

jonrkarr commented 4 years ago

There's just one issue left. The frontend also needs to generate reaction URLs for the reactions related to genes. This either one of the following

Either is fine. I tried generating the hashes. What algorithm are you using?

lzy7071 commented 4 years ago

Hash function is ripemd160. I'll add the additional info to the related_reactions endpoint.

jonrkarr commented 4 years ago

Thanks. I think its safer to pass the information to the frontend and this will make it easier for you to change this if needed.

lzy7071 commented 4 years ago

I have included reaction_participant object in the data returned.

jonrkarr commented 4 years ago

I integrated this into the frontend.