ModelSEED / ModelSEEDDatabase

This repository contains the definitive copy of the biochemistry and metadata used to construct models using the ModelSEED/ProbAnno approach
Other
53 stars 38 forks source link

inconsistent orders in equation/definition #5

Closed nconrad closed 9 years ago

nconrad commented 9 years ago

I'm trying to make compounds in equations linkable as in the following screenshot, but there's no easy to way to do this since the order of ids is different from the order of names.

Here's an example from https://www.patricbrc.org/api/model_reaction/?http_accept=application/solr+json&eq(id,rxn00017):

(2) cpd00003[c] + (2) cpd00165[c] <=> (2) cpd00004[c] + (2) cpd00067[c] + (1) cpd01252[c]

vs

(2) Hydroxylamine[c] + (2) NAD[c] <=> (2) H+[c] + (1) Hyponitrite[c] + (2) NADH[c]

Anyway we can make these match? @mmundy42 , this may be related to the get_model method.

screen shot 2015-07-15 at 8 46 05 am
samseaver commented 9 years ago

The code that creates the equations, whether you print compounds or print names, gathers the strings first, then sorts them, so in its current state, you can never get them to match.

I can have a look at that, because it shouldn't be hard to sort by compound identifiers in every case, before collecting the strings that go into the output, but it does mean that, for most equations where you're printing compound names, it'll look different than it did before (i.e. in KBase, or in published supplementary material, etc.)

nconrad commented 9 years ago

Actually, an alternative option (and better option if we care about the order changing) would be to list the id in compound_ids in the same order as the definition.

@mmundy42, this may work better for get_model as well, which I think you suggested once. I don't know if it's useful to have the compound id version of the equation at, say, the command line. If not, let's go with ordered compound ids?

samseaver commented 9 years ago

This PR should help once deployed: https://github.com/ModelSEED/ProbModelSEED/pull/45

You can then expect compound order to be consistently the same regardless of output format.

mmundy42 commented 9 years ago

I merged the PR. @nconrad, I will check the behavior of get_model() and we can adjust as needed.

mmundy42 commented 9 years ago

@nconrad, get_model() returns an array of model_reaction structures which are defined as:

typedef structure {
    reaction_id id;
    string name;
    string definition;
    string gpr;
    list<gene_id> genes;
} model_reaction;

Currently there are no compound IDs returned for a reaction. I think I had suggested just returning the compound IDs but I think we determined that would cause too much client-side processing.

Do you also want the compound IDs in the model_reaction structure? If so, what format? Maybe a list that is the same order as the compounds in the definition? Not sure what format makes it easiest for the client-side code.

nconrad commented 9 years ago

@mmundy42 , it wasn't so much that it was too much processing, I was originally just favoring 'readability' there over 'more data and less computation'.

Chris pointed out to me, how about just returning the stoichiometry format?

"-3:cpd00067:c:0:"H+";2:cpd00013:c:0:"NH3";-1:cpd00742:c:0:"Allophanate";-1:cpd00001:c:0:"H2O";2:cpd00011:c:0:"CO2""

Preferably as a list of lists instead of separated by : and ;, and without the extra quotes.

mmundy42 commented 9 years ago

@nconrad, does this updated model_reaction match what you are looking for?

typedef structure {
    reaction_id id;
    string name;
    list<tuple<float coefficient, compound_id id, compartment_id compartment, int compartment_index, string name>> stoichiometry;
    string gpr;
    list<gene_id> genes;
} model_reaction;
nconrad commented 9 years ago

Cool. Do we still need the reaction directionality?

Has get_model been deployed? I'll have to test it out.

mmundy42 commented 9 years ago

Good point on directionality. Here's the latest spec:

typedef structure {
    reaction_id id;
    string name;
    list<tuple<float coefficient, compound_id id, compartment_id compartment, int compartment_index, string name>> stoichiometry;
    string direction;
    string gpr;
    list<gene_id> genes;
} model_reaction;

And here is an example of the output:

    {
        "direction": ">",
        "name": "4-Carboxymuconolactone carboxy-lyase_c0",
        "genes": [
            "PATRICSOLR:224308.49||/features/id/fig|224308.49.peg.2706"
        ],
        "stoichiometry": [
            [
                -1,
                "cpd00067",
                "c",
                0,
                "H+_c0"
            ],
            [
                -1,
                "cpd00938",
                "c",
                0,
                "4-Carboxymuconolactone_c0"
            ],
            [
                1,
                "cpd00011",
                "c",
                0,
                "CO2_c0"
            ],
            [
                1,
                "cpd02255",
                "c",
                0,
                "3-oxoadipate-enol-lactone_c0"
            ]
        ],
        "id": "rxn02483_c0",
        "gpr": "fig|224308.49.peg.2706"
    },

I'm not sure how or when the latest code is getting deployed.

mmundy42 commented 9 years ago

Does anybody know if the latest code has been deployed so this can be verified and closed?

cshenry commented 9 years ago

Yup. Should be deployed.

Sent from my iPhone

On Aug 19, 2015, at 2:13 PM, Mike Mundy notifications@github.com wrote:

Does anybody know if the latest code has been deployed so this can be verified and closed?

— Reply to this email directly or view it on GitHub.