glygener / glygen-issues

Repository for public GlyGen tickets
GNU General Public License v3.0
0 stars 0 forks source link

Glycosylation and Associated Protein not represented for carbbank datasets #1634

Closed ubhuiyan closed 3 weeks ago

ubhuiyan commented 1 month ago

I noticed within the human carbbank dataset there is one association between glycan and protein.

Screenshot 2024-08-14 at 1 35 35 PM When I checked the protein details page for Q6P1J9-1, I did not find any mention of that glycan.

Screenshot 2024-08-14 at 1 35 53 PM

Note: This appears to be the case for mouse, rat, and yeast carbbank datasets as well.

ubhuiyan commented 1 month ago

Also, I found the same issue with the human_proteoform_glycosylation_sites_platelet.csv dataset. This may likely be due to the last-minute modifications. I imagine this will be resolved after you've implemented those changes Robel.

rykahsay commented 1 month ago

No, that is not the reason ... look at the screenshot of misc/dataset-masterlist.json below. Do you see a problem? Please fix that and assign the ticket back to me:

image
rykahsay commented 1 month ago

As for the platelets dataset, I see the first row in the dataset reflected in the website. Please give me example that is not coming as expected

$ cat reviewed/human_proteoform_glycosylation_sites_platelet.csv |head -2
"uniprotkb_canonical_ac","glycosylation_site_uniprotkb","amino_acid","saccharide","glycosylation_type","xref_key","xref_id","src_xref_key","src_xref_id","glycosylation_subtype","site_type","start_pos","end_pos","start_aa","end_aa","composition","peptide","peptide_start_pos","peptide_end_pos","site_seq","notes"
"P02675-1","251","Thr","G81399MY","O-linked","protein_xref_pubmed","38237698","protein_xref_data_submission","GLY_001051","","known","251","251","T","T","Hex(1)","KGGETSEMYLIQPDSSVKPYR","247","267","T","Thrombin-activated platelet releasate proteins were found to be enriched for a wide range of O-glycan modifications. Some C-mannosylation glycosylation sites were also identified."
image
ubhuiyan commented 1 month ago

Carbbank Data: I have updated the misc/dataset-masterlist.json to indicate "integrate-all". I believe that should resolve the issue.:

    {
        "name": "glycosylation_sites_carbbank", 
        "format": "csv", 
        "primaryfield": "uniprotkb_canonical_ac", 
        "target_objects": [
            "protein", 
            "glycan"
        ], 
        "integration_status": {
            "status": "integrate_all", 
            "excludelist": []
        }, 
        "categories": {
            "molecule": "proteoform", 
            "species": [
                "human", 
                "mouse", 
                "rat", 
                "yeast"
            ]
        }
    },

User Submitted Data: I mistakenly was checking based off of the human_proteoform_citations_glycosylation_sites_platelet.csv file. That's my bad. User-submitted data looks good on my end.

rykahsay commented 1 month ago

... I am waiting for you on this

ubhuiyan commented 1 month ago

@rykahsay My apologies, I forgot to reassign. I have responded to your original comment. Essentially I changed the dataset-masterlist.json to indicate "integrate-all" instead of "integrate-none".

rykahsay commented 1 month ago

Please check now:

image
ubhuiyan commented 3 weeks ago

Looks good