Closed ReneRanzinger closed 1 month ago
Fixed on tst now:
@kmartinez834 can you confirm that we only have total 2 glycans that are bound by proteins? Its these two: https://www.glygen.org/glycan/G19059PI#Glycan-Binding-Protein https://www.glygen.org/glycan/G03536SO#Glycan-Binding-Protein Is this really all we can import from MatrixDB?
Confirmed. All of the other glycan structures from MatrixDB are generic terms like "Heparin" or "Dermatan Sulfate"
@edwardsnj was providing mapping for these as well (matrixdb.tsv), but they were removed in the Dec 2023 export:
GlyTouCanAccession | MatrixDBAccession | matrix_db_label |
---|---|---|
G16235VG | GAG_9 | Hyaluronan(short label) |
G46732TY | GAG_6 | Chondroitin Sulfate D(short label) |
--> Nathan, were these included in the removal of glycans with no species composition basis or other rationale for inclusion? Can we discuss if there are alternative accessions, or if they should be brought back?
@ReneRanzinger @kmartinez834 Just to clarrify, the matrixdb.tsv export from glygen-data hasn't changed in three years. The GAG* references I have come from GlyTouCan. However, I just looked on the MatrixDB site, and one of its downloads provides GlyTouCan xrefs for GAG accessions. I have incorporated this download as a way to map the GAG_ accessions. There are 58 GAG_* accessions and 55 of them have a mapping to GlyTouCan accession in this file. Of these, just 9 are 'naturally' in the GlyGen set - and all of these MatrixDB accessions were in the previous export of matrixdb.tsv.
Lets still use the files4nathan protocol to indicate which of the GAG_* accessions you want me to include, in the next release.
Of the 9 GAG_* accessions in the current GlyGen release, just 2 of them show binding proteins on their MatrixDB pages (GAG_7 and GAG_4, G03536SO and G19059PI).
I've added the source file (matrixdbCORE.tab) to files4nathan. FYI, it currently contains the following GAG* accessions:
matrixdb:GAG_13 matrixdb:HepMer_dp09_0001(short label)
matrixdb:GAG_1 matrixdb:Heparin(short label)
matrixdb:GAG_2 matrixdb:Heparan sulfate(short label)
matrixdb:GAG_3 matrixdb:Dermatan Sulfate(short label)
matrixdb:GAG_4 matrixdb:Chondroitin Sulfate A(short label)
matrixdb:GAG_6 matrixdb:Chondroitin Sulfate D(short label)
matrixdb:GAG_7 matrixdb:Chondroitin Sulfate E(short label)
matrixdb:GAG_9 matrixdb:Hyaluronan(short label)
OK, I will add this to the Glycan Data scripts...
According to this there are >18k glycans that are bound by proteins:
Click on the number, click on the first few Glycans ... none has Glycan Binding Proteins