glygener / glygen-issues

Repository for public GlyGen tickets
GNU General Public License v3.0
0 stars 0 forks source link

Carbbank Mapping Instructions #1568

Closed ubhuiyan closed 1 month ago

ubhuiyan commented 1 month ago

Source file: downloads/carbbank/current/carbbank.csv

Mapping files: downloads/glytoucan/current/export/carbbank.tsv

Output file: *_proteoform_glyosylation_sites_carbbank.csv

The output file should have the following headers:

"uniprotkb_canonical_ac","saccharide","glycosylation_type","xref_key","xref_id"

Organism Mapping:

See the chart below for instructions on mapping source fields to output (if any required fields are missing here, add them in with blank values):

Source field Output field Instructions
DB uniprotkb_canonical_ac Accessions starting with SwissProt
CC saccharide Map to glytoucan ac using carbbank.tsv
MT glycosylation_type N-linked if "N-linked glycopeptide" or "N-linked glycoprotein"
O-linked if "O-linked glycopeptide" or "O-linked glycoprotein"
xref_key All rows: "protein_xref_pubmed"
DB xref_id Accessions starting with PMID

Example:

Input file

"CCSD:42830","Irie F; Murakoshi H; Suzuki T; Suzuki Y; Kon K; Ando S; Yoshida K;Hirabayashi Y","Characterization of four monosialo and a novel disialo Asn N-glycosides from the urine of a patient with aspartylglycosaminuria","Glycoconj J (1995) 12: 290-297","AG-1","(CN) human, (OT) urine, (disease) aspartylglycosaminuria","Shough NJ","19-09-1995","944504e4","CBank:14042","","N-linked glycopeptide","","","","","","","","PMID:7496144","","",""
"CCSD:5474","Carlsson SR; Lycksell PO; Fukuda M","Assignment of O-glycan attachment sites to the hinge-like regions ofhuman lysosomal membrane glycoproteins lamp-1 and lamp-2","Arch Biochem Biophys (1993) 304: 65-73","","(CN) human, (OT) lysosome membrane","Doubet S","06-10-1993","cb86e782","CBank:2116","","O-linked glycoprotein","","","","lamp-2","","desialylated","","SwissProt:P13473 -- lamp-2","","",""

Output file

"uniprotkb_canonical_ac","saccharide","glycosylation_type","xref_key","xref_id"
"", "G00065MO", "N-linked", "protein_xref_pubmed", "7496144" 
"P13473", "G00032MO", "O-linked", "protein_xref_pubmed", ""
...
rykahsay commented 1 month ago

I have tried to create the datasets -- please check.

Here are some points:

Please assign the ticket back to me after you edit misc/ds2bco.json so that I can update the datasets

ubhuiyan commented 1 month ago

I have updated the misc/ds2bco.json. Please let me know if there's any issues.

rykahsay commented 1 month ago

I have updated the datasets, please check

ubhuiyan commented 1 month ago
[sbhuiyan28@glygen-vm-dev unreviewed]$ awk -F, '{print $1, $4, $5, $6, $7, $8, $9}' human_proteoform_glycosylation_sites_carbbank.csv | head -2
"uniprotkb_canonical_ac" "saccharide" "glycosylation_type" "xref_key" "xref_id" "src_xref_key" "src_xref_id"
"Q6P1J9-1" "G36855WW" "O-linked" "protein_xref_pubmed" "93364082" "protein_xref_glygen_ds" "GLY_001113"
[sbhuiyan28@glygen-vm-dev unreviewed]$ awk -F, '{print $1, $4, $5, $6, $7, $8, $9}' rat_proteoform_glycosylation_sites_carbbank.csv | head -2
"uniprotkb_canonical_ac" "saccharide" "glycosylation_type" "xref_key" "xref_id" "src_xref_key" "src_xref_id"
"A6JX65-1" "G36855WW" "O-linked" "protein_xref_pubmed" "93364082" "protein_xref_glygen_ds" "GLY_001115"
[sbhuiyan28@glygen-vm-dev unreviewed]$ awk -F, '{print $1, $4, $5, $6, $7, $8, $9}' mouse_proteoform_glycosylation_sites_carbbank.csv | head 
-2
"uniprotkb_canonical_ac" "saccharide" "glycosylation_type" "xref_key" "xref_id" "src_xref_key" "src_xref_id"
"Q3UD01-1" "G99229VQ" "N-linked" "protein_xref_pubmed" "7496147" "protein_xref_glygen_ds" "GLY_001114"
[sbhuiyan28@glygen-vm-dev unreviewed]$ awk -F, '{print $1, $4, $5, $6, $7, $8, $9}' yeast_proteoform_glycosylation_sites_carbbank.csv | head 
-2
"uniprotkb_canonical_ac" "saccharide" "glycosylation_type" "xref_key" "xref_id" "src_xref_key" "src_xref_id"
"P0CW41-1" "G46565HP" "N-linked" "protein_xref_pubmed" "7496153" "protein_xref_glygen_ds" "GLY_001116"