Closed exaexa closed 2 years ago
(cc @stelmo )
I could not agree more! And you're right, this will be tackled via PR #301, as RAVEN's model format has metComps
and grRules
fields. However, be aware .mat
is a binary file that is only included in the main
branch, and not in the develop
branch. This explains why #301 does not yet contain the binary, but once merged to main
it will appear in the next release. I'll make sure that this release happens asap (hopefully this week).
Aaah okay, thanks for explanation of the main
vs develop
.
We preliminarily added the metComps
identifier to the list of compartment-containing identifiers into COBREXA, and we already support loading of the rules from grRules
. So I guess this will basically solve itself once #301 is merged, right?
Great, thanks!
Description of the issue:
It is extremely useful to include a vector of compartment IDs for each metabolite in the
.mat
model. As far as I can see, in the current published yeast GEM the compartments can only be accessed by manually parsing out the compartment suffix from the metabolite ID, which is doable but rather inconvenient for users, and usually fragile. Moreover, it avoids ambiguity about how to format the compartment into the ID. (Just for the record, so far I've seen all ofmetaboliteID[compID]
,metaboliteID_compID
andcompID_metaboliteName
.)Generally, FAIR advises to avoid this kind of metadata storage (similarly in FP we call this "stringly-typed" data), and advocates explicitly materializing and naming the metadata source.
It would be great to include an extra entry named
metCompartment
ormetCompartments
(as used in some other models) that explicitly specifies the compartment IDs for each metabolite.Quite similarly, it would be nice to avoid index-based gene-product-reaction rules -- I assume these are used because they can be evaluated with matlab
eval()
which unfortunately doesn't port well (at all) to any other ecosystem. For example, the canonicale_coli_core.mat
usesgrRules
with semantic content, containing relatively approachable strings likegeneID1 and geneID2 or geneID3
.I think the changes in #301 might be related, but the
.mat
file in the PR doesn't seem to have these features either.Expected feature/value/output:
yeast-GEM.mat has a machine-readable mapping of metabolite compartments and symbolic (index-less) references to genes in
rules
.Reproducing these results:
Not really applicable here -- we're opening the models with Julia
MAT
package, and no per-metabolite compartment information can be seen in there.I hereby confirm that I have:
main
branch of the repository