BojarLab / glycowork

Package for processing and analyzing glycans and their role in biology.
https://Bojarlab.github.io/glycowork
MIT License
56 stars 11 forks source link

v3_sugarbase.csv WURCS and glytoucan_acc columns do not match the glycan column #5

Closed mobiusklein closed 3 years ago

mobiusklein commented 3 years ago

Hello,

While looking at the v3_sugarbase.csv static file, I noticed that there's a mismatch between the glycan column's IUPAC notation and the WURCS and glytoucan_acc columns.

For example, the row with glycan_id = 2, the glycan is GlcNAc(b1-2)[Gal(b1-3)[Neu5Ac(a2-6)]GlcNAc(b1-4)]Man(a1-3)[GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc, which is a NeuAc-containing glycan: image

The WURCS column contains a much shorter sequence, WURCS=2.0/2,5,4/[a2122h-1b_1-5][a2211m-1a_1-5]/1-2-2-2-2/a2-b1_b4-c1_c3-d1_c4-e1, which does not contain NeuAc. It parses to: image and the glytoucan_acc column references https://glytoucan.org/Structures/Glycans/G52117LP, which matches my parsing.

There are many more examples like this, but I wasn't able to successfully parse the whole table.

Bribak commented 3 years ago

Good point; they were actually not supposed to be released at this point (left-over from a while ago and not used anywhere in the package). The mapping to GlyTouCan IDs will come in the next version. For now, I would ignore these columns & I'll probably remove them soon.