glygener / glygen-issues

Repository for public GlyGen tickets
GNU General Public License v3.0
0 stars 0 forks source link

Super Search: Incorrect glycan binding number #1099

Closed ReneRanzinger closed 1 month ago

ReneRanzinger commented 3 months ago

According to this there are >18k glycans that are bound by proteins:

motif

Click on the number, click on the first few Glycans ... none has Glycan Binding Proteins

rykahsay commented 2 months ago

Fixed on tst now:

image
ReneRanzinger commented 2 months ago

@kmartinez834 can you confirm that we only have total 2 glycans that are bound by proteins? Its these two: https://www.glygen.org/glycan/G19059PI#Glycan-Binding-Protein https://www.glygen.org/glycan/G03536SO#Glycan-Binding-Protein Is this really all we can import from MatrixDB?

kmartinez834 commented 2 months ago

Confirmed. All of the other glycan structures from MatrixDB are generic terms like "Heparin" or "Dermatan Sulfate"

kmartinez834 commented 2 months ago

@edwardsnj was providing mapping for these as well (matrixdb.tsv), but they were removed in the Dec 2023 export:

GlyTouCanAccession MatrixDBAccession matrix_db_label
G16235VG GAG_9 Hyaluronan(short label)
G46732TY GAG_6 Chondroitin Sulfate D(short label)

--> Nathan, were these included in the removal of glycans with no species composition basis or other rationale for inclusion? Can we discuss if there are alternative accessions, or if they should be brought back?

edwardsnj commented 2 months ago

@ReneRanzinger @kmartinez834 Just to clarrify, the matrixdb.tsv export from glygen-data hasn't changed in three years. The GAG* references I have come from GlyTouCan. However, I just looked on the MatrixDB site, and one of its downloads provides GlyTouCan xrefs for GAG accessions. I have incorporated this download as a way to map the GAG_ accessions. There are 58 GAG_* accessions and 55 of them have a mapping to GlyTouCan accession in this file. Of these, just 9 are 'naturally' in the GlyGen set - and all of these MatrixDB accessions were in the previous export of matrixdb.tsv.

Lets still use the files4nathan protocol to indicate which of the GAG_* accessions you want me to include, in the next release.

edwardsnj commented 2 months ago

Of the 9 GAG_* accessions in the current GlyGen release, just 2 of them show binding proteins on their MatrixDB pages (GAG_7 and GAG_4, G03536SO and G19059PI).

kmartinez834 commented 2 months ago

I've added the source file (matrixdbCORE.tab) to files4nathan. FYI, it currently contains the following GAG* accessions:

matrixdb:GAG_13 matrixdb:HepMer_dp09_0001(short label)
matrixdb:GAG_1  matrixdb:Heparin(short label)
matrixdb:GAG_2  matrixdb:Heparan sulfate(short label)
matrixdb:GAG_3  matrixdb:Dermatan Sulfate(short label)
matrixdb:GAG_4  matrixdb:Chondroitin Sulfate A(short label)
matrixdb:GAG_6  matrixdb:Chondroitin Sulfate D(short label)
matrixdb:GAG_7  matrixdb:Chondroitin Sulfate E(short label)
matrixdb:GAG_9  matrixdb:Hyaluronan(short label)
edwardsnj commented 1 month ago

OK, I will add this to the Glycan Data scripts...