Closed zakandrewking closed 9 years ago
Another, slightly different error:
NFO:root:Dumping RECON1
failed on '(8639.1) or (26.1) or (314.2) or (314.1)' in <Reaction 13DAMPPOX at 0x117891d50>
ERROR:root:Could not load model RECON1.xml.
ERROR:root:invalid syntax (<string>, line 1)
Traceback (most recent call last):
File "bin/load_db.py", line 170, in <module>
bioproject_id, timestamp, pub_ref, session)
File "/Users/zaking/sharedrepos/ome/ome/__init__.py", line 19, in wrapper
res = function(*args, **kwargs)
File "/Users/zaking/sharedrepos/ome/ome/loading/model_loading/load.py", line 153, in load_model
write_sbml_model3(cobra_model, join(unpolished_dir, model_bigg_id + '.xml'))
File "/Users/zaking/sharedrepos/cobrapy-sbml3/cobra/io/sbml3.py", line 390, in write_sbml_model
xml = model_to_xml(cobra_model, **kwargs)
File "/Users/zaking/sharedrepos/cobrapy-sbml3/cobra/io/sbml3.py", line 374, in model_to_xml
raise e
File "<string>", line 1
(8639__SBML_DOT__1) or (26__SBML_DOT__1) or (314__SBML_DOT__2) or (314__SBML_DOT__1)
^
SyntaxError: invalid syntax
These problems boil down to the characters in the gene IDs. They can be fixed with some modifications to cobra.io.sbml3.
However, it brings in a more general issue: Should we require that genes only have the characters a-zA-Z0-9_
, as in reaction and metabolite IDs? Should we append G_
before gene IDs in SBML files in the same way we append R_
and M_
before reactions and metabolites?
If we go with these changes, then alternative transcripts cannot use .
to delimit the alternative transcript. We would need to switch to something like 8639_AT1
(for the gene that is now called 8639.1
).
this is a tricky one.. some functions in the COBRA toolbox expect seeing the .# genes, and if we update the GPRs in Recon3 to include the refseq ids, those still have the .# (though they don't refer to isoforms.. rather they refer to versions of a particular isoform), but we could drop them from the gene ids.. the - in the yeast models is an actual portion of the gene id, so we should keep that.
I think it's best to keep the . and - in the human and yeast models.
Appending G to the genes is a good idea, but we would be essentially stating a new standard... is there a precedent for the G in the modeling field?
Nathan E. Lewis
Assistant Professor Department of Pediatrics University of California, San Diego Tel: (858) 997 - 5844 http://lewislab.ucsd.edu/
On Tue, Jun 2, 2015 at 2:48 PM, Zachary A. King notifications@github.com wrote:
These problems boil down to the characters in the gene IDs. They can be fixed with some modifications to cobra.io.sbml3.
However, it brings in a more general issue: Should we require that genes only have the characters a-zA-Z0-9, as in reaction and metabolite IDs? Should we append G before gene IDs in SBML files in the same way we append R and M before reactions and metabolites?
If we go with these changes, then alternative transcripts cannot use . to delimit the alternative transcript. We would need to switch to something like 8639_AT1.
— Reply to this email directly or view it on GitHub https://github.com/SBRG/BIGG2/issues/82#issuecomment-108108814.
Hi guys,
FBC 2 has introduced a label attribute on geneProduct. While the id of a geneProduct must follow the rules that @zakandrewking describes, this doesn't hold for the label attribute. It would therefore be best to keep the original gene id as the label on geneProduct and set a somehow modified id on geneProduct in order to generate valid output.
Cheers Andreas
I am forking COBRApy to deal with this in the short term:
https://github.com/zakandrewking/cobrapy/tree/sbml3-for-bigg
Nevermind, that won't solve this.
The current plan is:
Task completed. The following operations are performed with GeneIDs when necessary and also in the Gene Associations that link to those gene identifiers:
Anything else?
That's it, I think.
On Sun, Jun 14, 2015 at 11:02 AM, Andreas Dräger notifications@github.com wrote:
Task completed. The following operations are performed with GeneIDs when necessary and also in the Gene Associations that link to those gene identifiers:
- if the id doesn't start with "G_", the prefix is added.
"-" is replaced with "_". Anything else?
Reply to this email directly or view it on GitHub: https://github.com/SBRG/BIGG2/issues/82#issuecomment-111858341