Interoperability improvements for `yeast-GEM.mat`

exaexa commented 2 years ago

Description of the issue:

It is extremely useful to include a vector of compartment IDs for each metabolite in the .mat model. As far as I can see, in the current published yeast GEM the compartments can only be accessed by manually parsing out the compartment suffix from the metabolite ID, which is doable but rather inconvenient for users, and usually fragile. Moreover, it avoids ambiguity about how to format the compartment into the ID. (Just for the record, so far I've seen all of metaboliteID[compID], metaboliteID_compID and compID_metaboliteName.)

Generally, FAIR advises to avoid this kind of metadata storage (similarly in FP we call this "stringly-typed" data), and advocates explicitly materializing and naming the metadata source.

It would be great to include an extra entry named metCompartment or metCompartments (as used in some other models) that explicitly specifies the compartment IDs for each metabolite.

Quite similarly, it would be nice to avoid index-based gene-product-reaction rules -- I assume these are used because they can be evaluated with matlab eval() which unfortunately doesn't port well (at all) to any other ecosystem. For example, the canonical e_coli_core.mat uses grRules with semantic content, containing relatively approachable strings like geneID1 and geneID2 or geneID3.

I think the changes in #301 might be related, but the .mat file in the PR doesn't seem to have these features either.

Expected feature/value/output:

yeast-GEM.mat has a machine-readable mapping of metabolite compartments and symbolic (index-less) references to genes in rules.

Reproducing these results:

Not really applicable here -- we're opening the models with Julia MAT package, and no per-metabolite compartment information can be seen in there.

I hereby confirm that I have:

[ ] (not applicable:) Tested my code with all requirements for running the model
[x] Done this analysis in the main branch of the repository
[x] Checked that a similar issue does not exist already
[x] If needed, asked first in the Gitter chat room about the issue

exaexa commented 2 years ago

(cc @stelmo )

edkerk commented 2 years ago

I could not agree more! And you're right, this will be tackled via PR #301, as RAVEN's model format has metComps and grRules fields. However, be aware .mat is a binary file that is only included in the main branch, and not in the develop branch. This explains why #301 does not yet contain the binary, but once merged to main it will appear in the next release. I'll make sure that this release happens asap (hopefully this week).

exaexa commented 2 years ago

Aaah okay, thanks for explanation of the main vs develop.

We preliminarily added the metComps identifier to the list of compartment-containing identifiers into COBREXA, and we already support loading of the rules from grRules. So I guess this will basically solve itself once #301 is merged, right?

exaexa commented 2 years ago

Great, thanks!

SysBioChalmers / yeast-GEM