Closed mobiusklein closed 3 years ago
Good point; they were actually not supposed to be released at this point (left-over from a while ago and not used anywhere in the package). The mapping to GlyTouCan IDs will come in the next version. For now, I would ignore these columns & I'll probably remove them soon.
Hello,
While looking at the
v3_sugarbase.csv
static file, I noticed that there's a mismatch between theglycan
column's IUPAC notation and theWURCS
andglytoucan_acc
columns.For example, the row with
glycan_id
= 2, theglycan
isGlcNAc(b1-2)[Gal(b1-3)[Neu5Ac(a2-6)]GlcNAc(b1-4)]Man(a1-3)[GlcNAc(b1-2)Man(a1-6)]Man(b1-4)GlcNAc(b1-4)[Fuc(a1-6)]GlcNAc
, which is a NeuAc-containing glycan:The
WURCS
column contains a much shorter sequence,WURCS=2.0/2,5,4/[a2122h-1b_1-5][a2211m-1a_1-5]/1-2-2-2-2/a2-b1_b4-c1_c3-d1_c4-e1
, which does not contain NeuAc. It parses to: and theglytoucan_acc
column references https://glytoucan.org/Structures/Glycans/G52117LP, which matches my parsing.There are many more examples like this, but I wasn't able to successfully parse the whole table.