SBRG / bigg_models

The BiGG Models website server
http://bigg.ucsd.edu
Other
77 stars 18 forks source link

Call for feedback: New lipid naming convention #354

Open zakandrewking opened 4 years ago

zakandrewking commented 4 years ago

There is a nice proposal from @michaelwitting at WormJam for a systematic naming convention for lipids. It's described here:

https://github.com/JakeHattwell/wormjam/issues/11

This convention would generate nice-looking BiGG IDs. A couple examples:

Phosphatidylcholines:
1,2-diacylglycerophosphocholine --> 1ac2acg3pc (old pchol)
1-acylglycerophosphocholine --> 1acg3pc
2-acylglycerophosphocholine --> 2acg3pc

1-alkyl-2-acylglycerophosphocholine --> 1alk2acg3pc
1-alkylglycerophosphocholine --> 1alkg3pc

1-alkenyl-2-acylglycerophosphocholine --> 1alken2acg3pc
1-alkenylglycerophosphocholine --> 1alkeng3pc

If we adopt this, we will probably do it through the standard BiGG process: we will not remove old IDs (so pchol will stay), and new IDs will come in as we add models to BiGG. However, we can provide an extra level of checking to help users adopt these new IDs.

Tagging some BiGG watchers who might have feedback on this: @nel3 @neemajamshidi @matthiaskoenig @draeger @rmtfleming @phantomas1234 @willigott @cdanielmachado @smoretti @djinnome @cnorsig @jmcconn @jtyurkovich

cdanielmachado commented 4 years ago

Sounds like a good plan to me.

draeger commented 4 years ago

It seems reasonable. A question remains: Will old and new IDs coexist in the future? Would BiGG list the old IDs then as legacy identifiers and store them in a separate table?

zakandrewking commented 4 years ago

@draeger In this case, we will not replace any old IDs. Just add new ones.

matthiaskoenig commented 4 years ago

I don't like that this are not valid SBML identifiers, also it should be made clear that these are all triglycerols, i.e. using a glycerol backbone with possible three connections. There are other backbones which allow a wide range of connections, by being too specific here these other things cannot be encoded in a uniform manner)

I would highly recommend to prefix these to make the ids more clear, valid (SBML) and usable:

1,2-diacylglycerophosphocholine --> tg1ac2acg3pc (old pchol)
1-acylglycerophosphocholine --> tg1acg3pc
2-acylglycerophosphocholine --> tg2acg3pc

1-alkyl-2-acylglycerophosphocholine --> tg1alk2acg3pc
1-alkylglycerophosphocholine --> tg1alkg3pc

1-alkenyl-2-acylglycerophosphocholine --> tg1alken2acg3pc
1-alkenylglycerophosphocholine --> tg1alkeng3pc
michaelwitting commented 4 years ago

I think added the triglycerol here causes confusion with triacylglycerols. The name already contains g3pc, which is the ID for glycero-3-phosphocholine, the 1ac and 2ac state that it has an additional acyl group at positions 1 and 2.

matthiaskoenig commented 4 years ago

It will make it much easier to work with all triacylglycerols, if these have a common prefix because you can just filter the subset based on the prefix and listing the remaining 3 chains, i.e., than I even understand the rules for creating all possibilities:

1-alkyl-2-acylglycerophosphocholine --> tg1alk2ac3pc 1-alkylglycerophosphocholine --> tg1alk3pc

1-alkenyl-2-acylglycerophosphocholine --> tg1alken2ac3pc 1-alkenylglycerophosphocholine --> tg1alken3pc

matthiaskoenig commented 4 years ago

By the way also super easy to parse and no dependency if the pc is on 1 or 3.

michaelwitting commented 4 years ago

I see your points, but not all of them are tri-acyl-glycerols. Based on lipid biochemistry the glycerol-backbone is fixed to be sn-glycero-3-phosphate (coming from the synthesis). We could do it the other way round, having the lipid class in front and then the chain configuration, e.g. pc1ac2ac. Ideally the nomenclature should be consistent (at least in part) with the nomenclature used in the lipidomics field.

matthiaskoenig commented 4 years ago

Only listing the chains without a prefix will create problems if there is only one modification, which for instance would than be pc1 or 1pc which is probably already used as id for other things. By having a clear prefix the namespace becomes unique. Also there are other backbones which would require a prefix, e.g. the sphingolipids, which then have to be something like sp2ac3pc to disinguish from 2ac3pc, so why not name it something like g2ac3pc or tg2ac3pc, then it is clear from the id what the backbone is.

michaelwitting commented 4 years ago

Then I would go for version with g instead of tg, which avoids confusion with real triacylglycerols. Sphingolipids will become a bit more tricky in that regard, because there are several backbones possible. Rules for encoding the backbone would be need. For example C. elegans uses C17iso sphingoid bases, which are not found in mammals. I will think about different ways for the phospholipids.

michaelwitting commented 4 years ago

Made up my mind. We should go for the prefix. Works also well with some IDs that are already in BiGG. A example for glycero- and glycerophospholipids: g3pc and g3pe are already in BiGG for sn-glycero-phosphocholine and sn-glycero-phosphoethanolamine. g1ac3pc would then represent a 1-acyl-sn-glycero-phosphocholine etc. I will prepare a table with the "old" IDs and the new systematic ones. I would anyway need this for our WormJam model.

michaelwitting commented 4 years ago

Here is a table with examples from the WormJam model.

Class Metabolite Old / Wrong / Duplicated ID (WormJam) Correct / New ID
MG 1-acyl-sn-glycerol 1magol g1ac
MG 2-acyl-sn-glycerol mag g2ac
MG-O 1-alkyl-sn-glycerol --- g1alk
MG-P 1-(Z)-alk-1-enyl-sn-glycerol alkenglyc g1alken
DG 1,2-diacyl-sn-glycerol 12dag g1ac2ac
DG-O 1-alkyl-2-acyl-sn-glycerol akac2g g1alk2ac
DG-P 1-(Z)-alk-1-enyl-2-acyl-glycerol alkenac2g g1alken2ac
TG Triacyl-glycerol tag g1ac2ac3ac
TG-O 1-alkyl-2,3-diacylglycerol --- g1alk2ac3ac
TG-P 1-(Z)-alk-1-enyl-2,3-diacylglycerol --- g1alken2ac3ac
DHAP 1-acylglycerone 3-phosphate Adhap dhap1ac
DHAP-O 1-alkylglycerone 3-phosphate akdhap dhap1alk
PA 1,2-diacyl-sn-glycero-3-phosphate pa_pl g1ac2ac3p
PA 1,2-diacyl-sn-glycero-3-phosphate 12dag3p g1ac2ac3p
LPA 1-acyl-sn-glycero-3-phosphate alpa g1ac3p
LPA 1-acyl-sn-glycero-3-phosphate alpa_tag g1ac3p
LPA 1-acyl-sn-glycero-3-phosphate 1ag3p_SC g1ac3p
LPA 2-acyl-sn-glycero-3-phosphate --- g2ac3p
LPA-O 1-alkyl-sn-glycero-3-phosphate alkgp g1alk3p
PA-O 1-alkyl-2-acyl-sn-glycero-3-phosphate akac2gp g1alk2ac3p
LPA-P 1-(Z)-alk-1-enyl-sn-glycero-3-phosphate --- g1alken3p
PA-P 1-(Z)-alk-1-enyl-2-acyl-sn-glycero-3-phosphate --- g1alken2ac3p
PC 1,2-diacyl-sn-glycero-3-phosphocholine pchol g1ac2ac3pc
LPC 1-acyl-sn-glycero-3-phosphocholine ag3pc g1ac3pc
LPC 2-acyl-sn-glycero-3-phosphocholine 2agpc g2ac3pc
PC-O 1-alkyl-2-acyl-sn-glycero-3-phosphocholine akac2gchol g1alk2ac3pc
LPC-O 1-alkyl-sn-glycero-3-phosphocholine ak2lgchol g1alk3pc
PC-P 1-(Z)-alk-1-enyl-2-acyl-sn-glycero-3-phosphocholine --- g1alken2ac3pc
LPC-P 1-(Z)-alk-1-enyl-sn-glycero-3-phosphocholine --- g1alken3pc
PE 1,2-diacyl-sn-glycero-3-phosphoethanolamine pe g1ac2ac3pe
PE 1,2-diacyl-sn-glycero-3-phosphoethanolamine pe_BAC g1ac2ac3pe
LPE 1-acyl-sn-glycero-3-phosphoethanolamine acg3pe g1ac3pe
LPE 2-acyl-sn-glycero-3-phosphoethanolamine --- g2ac3pe
PE-O 1-alkyl-2-acyl-sn-glycero-3-phosphoethanolamine akac2gpe g1alk2ac3pe
LPE-O 1-alkyl-sn-glycero-3-phosphoethanolamine --- g1alk3pe
PE-P 1-(Z)-alk-1-enyl-2-acyl-sn-glycero-3-phosphoethanolamine alkenac2gpe g1alken2ac3pe
LPE-P 1-(Z)-alk-1-enyl-sn-glycero-3-phosphoethanolamine alken2gpe g1alken3pe
PS 1,2-diacyl-sn-glycero-3-phospho-L-serine ps g1ac2ac3ps
LPS 1-acyl-sn-glycero-3-phospho-L-serine acg3ps g1ac3ps
LPS 2-acyl-sn-glycero-3-phospho-L-serine --- g2ac3ps
PI 1,2-diacyl-sn-glycero-3-phospho(1)-D-myo-inositol pail g1ac2ac3pi
LPI 1-acyl-sn-glycero-3-phospho(1)-D-myo-inositol --- g1ac3pi
LPI 2-acyl-sn-glycero-3-phospho(1)-D-myo-inositol --- g2ac3pi
PIP 1,2-diacyl-sn-glycero-3-phospho(1)-D-myo-inositol-3-phosphate pail3p g1ac2ac3pi3p
PIP 1,2-diacyl-sn-glycero-3-phospho(1)-D-myo-inositol-4-phosphate pail4p g1ac2ac3pi4p
PIP 1,2-diacyl-sn-glycero-3-phospho(1)-D-myo-inositol-5-phosphate pail5p g1ac2ac3pi5p
PIP2 1,2-diacyl-sn-glycero-3-phospho(1)-D-myo-inositol-3,4-bisphosphate pail34p g1ac2ac3pi3p4p
PIP2 1,2-diacyl-sn-glycero-3-phospho(1)-D-myo-inositol-3,5-bisphosphate pail35p g1ac2ac3pi3p5p
PIP2 1,2-diacyl-sn-glycero-3-phospho(1)-D-myo-inositol-4,5-bisphosphate pail45p g1ac2ac3pi4p5p
PIP3 1,2-diacyl-sn-glycero-3-phospho(1)-D-myo-inositol-3,4,5-trisphosphate pail345p g1ac2ac3pi3p4p4p
PGP 1,2-diacyl-sn-glycero-3-phospho-(1ʼ-sn-glycero-3ʼ-phosphate) pgp g1ac2ac3pg3p
PG 1,2-diacyl-sn-glycero-3-phospho-(1'-sn-glycerol) pg g1ac2ac3pg
PG 1,2-diacyl-sn-glycero-3-phospho-(1'-sn-glycerol) pg_BAC g1ac2ac3pg

With multiple phosphorylated PI headgroups (PIP, PIP2 and PIP3) the back part of the ID gets a bit complicated, but I guess it is still fine. Alternative is to separate this with a _, e.g. g1ac2ac3pi_3p4p. CDP-DGs are still open. They would be g1ac2ac3cdp

matthiaskoenig commented 4 years ago

This looks great. Some comments below, not sure we have to solve all of this.

The following often occurs: ac=ac16 (palmitate by default, what means ac exactly?) ac18 (stearate) ac20, ac22, ac24 What about unsaturated variants (needs position of double bound and cis/trans)?

michaelwitting commented 4 years ago

This is now for the moment for only the generic versions. I already though about ways how to encode specific acyl, alkyl or alkenyl chains. The position and stereochemistry of the double bond should be encoded. I developed something that is suitable for fatty acids, acyl-CoAs etc (everything that has a single acyl chain). I wrote my thoughts down in a manuscript type of document, I think I will put it on a preprint server soonish. @matthiaskoenig if you want I can send the current rough and preliminary version via eMail to check.

michaelwitting commented 4 years ago

I will think about the cardiolipins and sphingolipids, might a bit tricky.

matthiaskoenig commented 4 years ago

@michaelwitting Yes, please send the preprint. I will give you feedback on it (konigmatt[AT]googlemail.com).

michaelwitting commented 4 years ago

Hi all. I would like to revive the discussion here. I was thinking about how to encode the side chains and the sphingoid bases etc. The main question is which level of detail is required. It would be good to have enough details to be able to reconstruct the chemical structure. In lipidomics shorthand notations like PC(16:0/16:1(9Z)) are used. Maybe this can be adapted?