Our approach to labelling residues and linkages doesn't work for ano-ano linkages

gitoliver commented 11 months ago

For non ano-ano linkages our current labeling system works fine: IndexOrdered: DNeu5Aca2-6DGalpb1-4DGlcpNAc[3S]b1-2DManpa1-3[DGlcpNAcb1-4][DManp[2S,3Me]a1-6DManpa1-6]DManpb1-4DGlcpNAc[6Me]b1-4DGlcpNAcb1-OH IndexOrderedLabeled: DNeu5Ac&Label=residue-14;a2-6&Label=link-12;DGalp&Label=residue-13;b1-4&Label=link-11;DGlcpNAc&Label=residue-12;[3&Label=link-13;S&Label=residue-15;]b1-2&Label=link-10;DManp&Label=residue-11;a1-3&Label=link-9;[DGlcpNAc&Label=residue-10;b1-4&Label=link-8;][DManp&Label=residue-7;[2&Label=link-7;S&Label=residue-9;,3&Label=link-6;Me&Label=residue-8;]a1-6&Label=link-5;DManp&Label=residue-6;a1-6&Label=link-4;]DManp&Label=residue-5;b1-4&Label=link-3;DGlcpNAc&Label=residue-3;[6&Label=link-2;Me&Label=residue-4;]b1-4&Label=link-1;DGlcpNAc&Label=residue-2;b1-&Label=link-0;OH&Label=residue-1;

However for ano-ano linkages, this means the "a" gets lost. IndexOrdered: DGlcpa1-1DGlcpa IndexOrderedLabeled: DGlcp&Label=residue-2;a1-1&Label=link-0;DGlcp&Label=residue-1;

Essentially it's because we put the alpha of DGlcpa into the linkage, and not the sugar residue. This follows the convention of what scientists do, i.e. we think like this: But what's cleaner to program is that the linkage is e.g. "1-2" and the sugar should be "DGlcpa"

With the current approach I'm losing the a info for Glc I'm not even sure where we would want the a to be in this representation?

Lachele commented 11 months ago

There is a legitimate scientific reason for putting the anomeric configuration into the linkage. The monosaccharides generally interconvert between alpha, beta and linear when in solution. It is only at the stage of making a linkage that they are fixed into one of them. So, the creation of the linkage is important. But, I also see your point, and it is possible to have, say, a crystal of DGlcpa1-OH.

I have no issues with splitting out the anomeric config from the linkage data. It might be better to separate it entirely rather than to put it back into the monosaccharide annotation, though I cringe at greater verbosity.

Let's brainstorm a bit about all the reps.

Re the images, my first thought is to put the a & b nearer the glyphs and have the 2-1 float more in the center.

gitoliver commented 10 months ago

I can't do this at the sequenceParser level as it doesn't know which connection is the ano-ano for cases like DGlcpa1-2[LFucpa1-1]DFrufb. i.e. should it be a1-1a or a1-2a? There aren't atoms yet so I can't guess which one is the anomeric atom. Either I bring in metadata telling me it's 2 for Fru, or I do it at a later point when I know what the sugars are.

Lachele commented 10 months ago

Is there a reason not to bring in the metadata? I think this is one big reason for the metadata to exist.

On Fri, Jan 5, 2024 at 7:39 AM Oliver @.***> wrote:

I can't do this at the sequenceParser level as it doesn't know which connection is the ano-ano for cases like DGlcpa1-2[LFucpa1-1]DFrufb. i.e. should it be a1-1a or a1-2a? There aren't atoms yet so I can't guess which one is the anomeric atom. Either I bring in metadata telling me it's 2 for Fru, or I do it at a later point when I know what the sugars are.

— Reply to this email directly, view it on GitHub https://github.com/GLYCAM-Web/gmml/issues/165#issuecomment-1878603832, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCV7D76E64AJGQMWMDTDODYM7YBHAVCNFSM6AAAAAA77KMR2OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZYGYYDGOBTGI . You are receiving this because you were assigned.Message ID: @.***>

-- :-) Lachele @.*** lachelefoley.com I'm happy people are learning that sleep is important. Now, please stop trying to kill the nocturnal folks.

gitoliver commented 10 months ago

I wrote the comment as I'm not going to do it immediately, so just writing out both options. If I do it in sequenceParser I will have to create metadata specifically for this step telling me the anomeric atom for each residue we handle. Note at that level it's just parsing the string into a graph structure using our rules so you can do DCowpa1-2LMoob1-OH and it won't throw an error until later when it tries to find a Glycam prep entry for "LMoob". sequenceParser was separated from the other logic on purpose, it's a standalone thing that drawGlycan uses. You can draw a 2D SNFG graph of DCowpa1-2LMoob1-OH and other things we don't support in Glycam. If I go the other way and fix the label when I have the atomic structures then it's easy, but I won't be able to have it be correct in drawGlycan. No-one (including us) is using drawGlycan, and we might end up using 3rd party software anyway, so I'm planning to leave this open for a while to see where we end up. For now it's not impacting anything that's in use so it's ok to leave it.

GLYCAM-Web / gmml

Our approach to labelling residues and linkages doesn't work for ano-ano linkages #165