Cannot export iMM904, iND750, RECON1 models with SBML level 3 version 1 & FBC version 2

zakandrewking commented 9 years ago

INFO:root:Dumping iMM904
failed on '( YER056C  or  YER060W  or  YER060W-A  or  YGL186C )' in <Reaction ADEt2 at 0x1167371d0>
ERROR:root:Could not load model iMM904.xml.
ERROR:root:unsupported operation  <_ast.BinOp object at 0x11557b0d0>
Traceback (most recent call last):
  File "bin/load_db.py", line 170, in <module>
    bioproject_id, timestamp, pub_ref, session)
  File "/Users/zaking/sharedrepos/ome/ome/__init__.py", line 19, in wrapper
    res = function(*args, **kwargs)
  File "/Users/zaking/sharedrepos/ome/ome/loading/model_loading/load.py", line 153, in load_model
    write_sbml_model3(cobra_model, join(unpolished_dir, model_bigg_id + '.xml'))
  File "/Users/zaking/sharedrepos/cobrapy-sbml3/cobra/io/sbml3.py", line 390, in write_sbml_model
    xml = model_to_xml(cobra_model, **kwargs)
  File "/Users/zaking/sharedrepos/cobrapy-sbml3/cobra/io/sbml3.py", line 374, in model_to_xml
    raise e
Exception: unsupported operation  <_ast.BinOp object at 0x11557b0d0>

INFO:root:Dumping iND750
failed on '(YER060W-A) or (YER056C) or (YGL186C) or (YER060W)' in <Reaction ADEt2 at 0x117168510>
ERROR:root:Could not load model iND750.xml.
ERROR:root:unsupported operation  <_ast.BinOp object at 0x11821ebd0>
Traceback (most recent call last):
  File "bin/load_db.py", line 170, in <module>
    bioproject_id, timestamp, pub_ref, session)
  File "/Users/zaking/sharedrepos/ome/ome/__init__.py", line 19, in wrapper
    res = function(*args, **kwargs)
  File "/Users/zaking/sharedrepos/ome/ome/loading/model_loading/load.py", line 153, in load_model
    write_sbml_model3(cobra_model, join(unpolished_dir, model_bigg_id + '.xml'))
  File "/Users/zaking/sharedrepos/cobrapy-sbml3/cobra/io/sbml3.py", line 390, in write_sbml_model
    xml = model_to_xml(cobra_model, **kwargs)
  File "/Users/zaking/sharedrepos/cobrapy-sbml3/cobra/io/sbml3.py", line 374, in model_to_xml
    raise e
Exception: unsupported operation  <_ast.BinOp object at 0x11821ebd0>

zakandrewking commented 9 years ago

Another, slightly different error:

NFO:root:Dumping RECON1
failed on '(8639.1) or (26.1) or (314.2) or (314.1)' in <Reaction 13DAMPPOX at 0x117891d50>
ERROR:root:Could not load model RECON1.xml.
ERROR:root:invalid syntax (<string>, line 1)
Traceback (most recent call last):
  File "bin/load_db.py", line 170, in <module>
    bioproject_id, timestamp, pub_ref, session)
  File "/Users/zaking/sharedrepos/ome/ome/__init__.py", line 19, in wrapper
    res = function(*args, **kwargs)
  File "/Users/zaking/sharedrepos/ome/ome/loading/model_loading/load.py", line 153, in load_model
    write_sbml_model3(cobra_model, join(unpolished_dir, model_bigg_id + '.xml'))
  File "/Users/zaking/sharedrepos/cobrapy-sbml3/cobra/io/sbml3.py", line 390, in write_sbml_model
    xml = model_to_xml(cobra_model, **kwargs)
  File "/Users/zaking/sharedrepos/cobrapy-sbml3/cobra/io/sbml3.py", line 374, in model_to_xml
    raise e
  File "<string>", line 1
    (8639__SBML_DOT__1) or (26__SBML_DOT__1) or (314__SBML_DOT__2) or (314__SBML_DOT__1)
                     ^
SyntaxError: invalid syntax

zakandrewking commented 9 years ago

These problems boil down to the characters in the gene IDs. They can be fixed with some modifications to cobra.io.sbml3.

However, it brings in a more general issue: Should we require that genes only have the characters a-zA-Z0-9_, as in reaction and metabolite IDs? Should we append G_ before gene IDs in SBML files in the same way we append R_ and M_ before reactions and metabolites?

If we go with these changes, then alternative transcripts cannot use . to delimit the alternative transcript. We would need to switch to something like 8639_AT1 (for the gene that is now called 8639.1).

nel3 commented 9 years ago

this is a tricky one.. some functions in the COBRA toolbox expect seeing the .# genes, and if we update the GPRs in Recon3 to include the refseq ids, those still have the .# (though they don't refer to isoforms.. rather they refer to versions of a particular isoform), but we could drop them from the gene ids.. the - in the yeast models is an actual portion of the gene id, so we should keep that.

I think it's best to keep the . and - in the human and yeast models.

Appending G to the genes is a good idea, but we would be essentially stating a new standard... is there a precedent for the G in the modeling field?

Nathan E. Lewis

Assistant Professor Department of Pediatrics University of California, San Diego Tel: (858) 997 - 5844 http://lewislab.ucsd.edu/

On Tue, Jun 2, 2015 at 2:48 PM, Zachary A. King notifications@github.com wrote:

These problems boil down to the characters in the gene IDs. They can be fixed with some modifications to cobra.io.sbml3.

However, it brings in a more general issue: Should we require that genes only have the characters a-zA-Z0-9, as in reaction and metabolite IDs? Should we append G before gene IDs in SBML files in the same way we append R and M before reactions and metabolites?

If we go with these changes, then alternative transcripts cannot use . to delimit the alternative transcript. We would need to switch to something like 8639_AT1.

— Reply to this email directly or view it on GitHub https://github.com/SBRG/BIGG2/issues/82#issuecomment-108108814.

draeger commented 9 years ago

Hi guys,

FBC 2 has introduced a label attribute on geneProduct. While the id of a geneProduct must follow the rules that @zakandrewking describes, this doesn't hold for the label attribute. It would therefore be best to keep the original gene id as the label on geneProduct and set a somehow modified id on geneProduct in order to generate valid output.

Cheers Andreas

zakandrewking commented 9 years ago

I am forking COBRApy to deal with this in the short term:

https://github.com/zakandrewking/cobrapy/tree/sbml3-for-bigg

zakandrewking commented 9 years ago

Nevermind, that won't solve this.

The current plan is:

@draeger will add G to the SBML models. To answer your question Nate, this is a new idea, but it will be completely invisible to COBRA users because the G gets stripped off when you load an SBML model (just like the R and M for reactions and models.)
I will take all special characters out of the BiGG IDs for genes, and store the original locus IDs as synonyms. That way we always have them, we can show them on the website, and we can even give users an option to download the models with those locus IDs instead of the scrubbed ones.

draeger commented 9 years ago

Task completed. The following operations are performed with GeneIDs when necessary and also in the Gene Associations that link to those gene identifiers:

if the id doesn't start with "G_", the prefix is added.
"-" is replaced with "_".

Anything else?

zakandrewking commented 9 years ago

That's it, I think.

On Sun, Jun 14, 2015 at 11:02 AM, Andreas Dräger notifications@github.com wrote:

Task completed. The following operations are performed with GeneIDs when necessary and also in the Gene Associations that link to those gene identifiers:

if the id doesn't start with "G_", the prefix is added.

"-" is replaced with "_". Anything else?

Reply to this email directly or view it on GitHub: https://github.com/SBRG/BIGG2/issues/82#issuecomment-111858341

SBRG / bigg_models

Cannot export iMM904, iND750, RECON1 models with SBML level 3 version 1 & FBC version 2 #82

"-" is replaced with "_". Anything else?