Closed sujeetvkulkarni closed 1 year ago
@ReneRanzinger, @kmartinez834 --> answer to first question: current export file from Nathan shows only 120 motif IDs
cat downloads/glytoucan/current/export/allmotifs.tsv | awk '{print $1}' | grep -v MotifAccession|sort -u |wc 120 120 1320
Here is my explanation --
On the supersearch page: GGM.000001 is in 8990 glycans which glycosylate 199 non-ambiguous sites -- which are known ranges on 88 proteins. In other words, the association of "motif" to "protein" goes through "site", and the site objects we have are only for non-ambiguous sites. The other 85 proteins are connected to the motif through the "motif-glycan-enzyme/protein" path.
On the motif list page: GGM.000001 is in 8990 glycans which glycosylate total of 151 protein sequences (88 of these proteins have non-ambiguous sites, and the remaining 63 have ambiguous sites with unknown position. This page is not considering proteins that are associated to the 8990 glycans through enzyme (which I will fix soon).
In the next version, may be we can represent ambiguous sites using start_pos=1 and end_pos=sequence_length
For 2.1 release, this is what we have now:
a) GGM.000001 --> associated with 12848. glycans --> associated with 85 enzymes/proteins b) GGM.000001 --> associated with 12848. glycans --> associated with 150 glycoproteins b1) GGM.000001 --> associated with 12848. glycans --> associated with 87 glycoproteins with known glycosylation sites b2) GGM.000001 --> associated with 12848. glycans --> associated with 105 glycoproteins with unknown glycosylation sites
When you do supersearch, what you get is (a) + (b1) which is 85 + 87 = 172 since there is no representation of uknown sites
@ReneRanzinger @rykahsay Some questions :
@rykahsay for 2.1 we will change the motif page so that only the number of proteins will be shown that are glycosylated with a glycan that caries that motif. No proteins from the enzyme context or the binding context.
GGM.000001 --> associated with 12848. glycans --> associated with 150 glycoproteins
Out of these 150 proteins: 45 of them bear glycans at "only know sites" (all glycans associated with these proteins are at known sites) 63 of them bear glycans at "only unknown sites" (all glycans associated with these proteins are at unknown sites) 42 of them bear glycans at both "known and unknown sites" (some glycans associated with these proteins are at known sites and others are at unknown sites)
Fixed on beta
If you search with Glycan Motif ID - GGM.000001 on super search you get 173 associated proteins but if you see on the No of proteins on motif list (https://www.glygen.org/list-of-motifs/) for GGM.000001 you see 151. Can you please look into the discrepancy.