draeger-lab / ModelPolisher

ModelPolisher accesses the BiGG Models knowledgebase to annotate SBML models.
23 stars 7 forks source link

incorrect id - geneProduct correspondence #28

Closed silviamorins closed 4 years ago

silviamorins commented 6 years ago

When I let Modelpolisher annotate an SBML model, it adds before every geneProduct id "G" (<genename> becomes `G`, so that in the end I don't have any more correspondence between the geneProduct id, and the geneProduct indicated in the reactions. (When validating the SBML model into the SBML validator, I get for every geenproduct in my model an error of the type: "The attribute ‘fbc:geneProduct’ on a if set, must refer to ‘id’ of a in the .")

matthiaskoenig commented 5 years ago

Yes, this is a real issue. I will provide some background information why this is an issue and currently happens:

ModelPolisher was originally developed in close cooperation with the BIGG database. Within SBML models all main objects like species, reactions or genes must have identifiers, but how these identifiers can look is restricted by the SBML specification. Among others the ids (also the gene ids) can not start with numbers. As a consequence not all strings are valid SBML identifiers. The BIGG database has decided to allow identifiers which start with numbers, so these cannot be used as SBML identifiers, but some transformation must be made. This id transformations are

when writing the SBML model from the Bigg database.

If you have already valid SBML identifiers in your model this transformations are unnecessary. There should be an option to perform the id prefix additions or not (this is what we basically did in cobrapy to allow users which already have valid ids to not be bothered by the prefixes, which create a ton of other issues).

So in summary: These prefixing is there for a reason, but it is not necessary if you already have valid identifiers in your model. There should be an option to not add the id prefixes.

silviamorins commented 5 years ago

Hi @matthiaskoenig ,

thank you for the feedback, and I see what you mean and why it makes sense that this happens. I also think the best way to deal with this it to add an option to let the user decide whether or not to add prefixes.

mephenor commented 4 years ago

I'll have a look at adding an option for this. I remember adding some code to prepend the `G_' prefix to the corresponding GeneProductReference, which should be present in the current release and hopefully remedy this situation for now.

Schmoho commented 2 years ago

Apparently this was resolved via #85 I have removed this issue from the Release 2.1 project and moved #85 to needs testing.