glygener / glygen-issues

Repository for public GlyGen tickets
GNU General Public License v3.0
0 stars 0 forks source link

G66016NA and composition closure #889

Closed ReneRanzinger closed 5 months ago

ReneRanzinger commented 9 months ago

Hmmm. Looks like a bug? in the GNOme computation of composition (to relax to Hex) for a Gal-containing glycan in late 2019 is responsible for composition closure bringing in G66016NA.

We could eliminate all accessions that do not have a species based reason for inclusion (that would eliminate G66016NA, even currently) - but there have been various rationales for including structures in the past that were unrelated to species, and I don't know how to ensure these are not also removed.

Removing accessions that have no species composition closure would remove 4486 of 49028 accessions (about 10%).

We can talk about it on Wednesday morning.

ReneRanzinger commented 9 months ago

Test these glycans against Karinas datasets. If they are not in there we can remove them. It means they have no associated data in GlyGen and have not been added for special purposes (curation). They probably have only be added due to the composition closure.

edwardsnj commented 9 months ago

Of the 4486 "bad" accessions, 185 are GlyGen motif accessions or are mentioned in one or more of the glygen source files. 4301 bad accessions remain. I will exclude these 4301 accessions from the glycan data build in process.

edwardsnj commented 9 months ago

Note that the following accessions are annotated as human (taxid 9606) but they are not considered human species by us (due to Xyl or NeuGc).

G05768VS G09441IP G44066XU G55153IW G87557PZ G93819BH

These would be excluded too - unless you want to rescue them...

ReneRanzinger commented 9 months ago

Lets talk about this on Wednesday with @mtiemeyer0919. I would exclude them but Mike may have a different opinion.