glygener / glygen-issues

Repository for public GlyGen tickets
GNU General Public License v3.0
0 stars 0 forks source link

New dataset: human_proteoform_glyosylation_sites_diabetes_glycomic.csv #1279

Closed kmartinez834 closed 5 days ago

kmartinez834 commented 2 weeks ago

Source file: downloads/zagreb/current/HG_FinnRisk.txt

Mapping files: misc/fig_to_gtc.csv

Output file: human_proteoform_glyosylation_sites_diabetes_glycomic.csv

The output file should have the following headers:

"uniprotkb_canonical_ac","glycosylation_site_uniprotkb","amino_acid","saccharide","glycosylation_type","xref_key","xref_id","start_pos","end_pos","start_aa","end_aa","abundance","sample_id","source_tissue_id","source_tissue_name"

See the chart below for instructions on mapping source fields to output (if blank fields are not required to populate the glycan detail api, you can exclude from output file):

Source field Output field Instructions
uniprotkb_canonical_ac Blank, no protein data
glycosylation_site_uniprotkb Blank, no site data
amino_acid Blank, no site data
GP* column headers saccharide Map to glytoucan ac using misc/fig_to_gtc.csv fields "ID" and "glytoucan"
glycosylation_type All rows: "N-linked"
xref_key All rows: "protein_xref_pubmed"
xref_id All rows: "28905229"
start_pos Blank, no site data
end_pos Blank, no site data
start_aa Blank, no site data
end_aa Blank, no site data
GP* abundance Columns that begin with "GP" contain abundance data, convert "," in values to "." See example below
Sample sample_id No change, copy directly from "Sample"
source_tissue_id All rows: "UBERON:0001969"
source_tissue_name All rows: "blood plasma"

Example:

Input file

Sample  Cohort  BL_AGE  DIAB_AGE    PREVAL_DIAB INCIDENT_DIAB   DIAB_T2 INCIDENT_DIAB_T2    PREVAL_DIAB_T2  DIABETES    GP1 GP2 GP3 GP4 GP5 GP6 GP7 GP8 GP9 GP10    GP11    GP12    GP13    GP14    GP15    GP16    GP17    GP18    GP19    GP20    GP21    GP22    GP23    GP24    GP25    GP26    GP27    GP28    GP29    GP30    GP31    GP32    GP33    GP34    GP35    GP36    GP37    GP38    GP39    GP40    GP41    GP42    GP43    GP44    GP45    GP46    LB  HB  S0  S1  S2  S3  S4  G0  G1  G2  G3  G4
4002080252  FinnRisk    55,425  57,914  0   1   1   1   0   1   0,016117543 0,072337091 0,002571769 0,045732195 3,306536179 1,466414739 0,056500309 4,104060707 2,48862935  1,501083086 1,077958737 0,684379786 0,078062741 4,419775322 0,963071407 0,84101438  0,882493985 0,076572836 8,50362312  0,381232962 6,479537828 2,2373325   2,769472535 1,133307927 26,85132263 0,767373623 6,474634808 4,603362955 1,475636797 0,079695833 1,402411565 0,865194439 0,580548817 4,130133093 0,223745791 0,550068486 1,574027557 3,533020323 0,43510067  0,498498337 0,407195851 0,321716346 0,528682933 0,311148508 0,486702638 0,311958971 82,36420888 17,63579112 20,28323096 22,25097598 42,78774613 12,39033751 2,367405248 5,591517533 9,952926422 65,66211036 10,80176655 6,91372041

Output file

"uniprotkb_canonical_ac","glycosylation_site_uniprotkb","amino_acid","saccharide","glycosylation_type","xref_key","xref_id","start_pos","end_pos","start_aa","end_aa","abundance","sample_id","source_tissue_id","source_tissue_name"
"","","","G83367MW","N-linked","protein_xref_pubmed","28905229","","","","","0.016117543","4002080252","UBERON:0001969","blood plasma"
"","","","G69449MU","N-linked","protein_xref_pubmed","28905229","","","","","0.072337091","4002080252","UBERON:0001969","blood plasma"
"","","","G95184RD","N-linked","protein_xref_pubmed","28905229","","","","","0.002571769","4002080252","UBERON:0001969","blood plasma"
"","","","G83155SV","N-linked","protein_xref_pubmed","28905229","","","","","0.045732195","4002080252","UBERON:0001969","blood plasma"
"","","","G92683CO","N-linked","protein_xref_pubmed","28905229","","","","","3.306536179","4002080252","UBERON:0001969","blood plasma"
...

FYI @ubhuiyan

rykahsay commented 2 weeks ago

done, check unreviewed/human_proteoform_glycosylation_sites_diabetes_glycomic.csv

kmartinez834 commented 1 week ago

Just one change:

rykahsay commented 5 days ago
$ cat reviewed/human_proteoform_glycosylation_sites_diabetes_glycomic.csv | grep GLY_000960 |head

"","","","G59722IA","N-linked","protein_xref_pubmed","28905229","protein_xref_glygen_ds","GLY_000960","0.079695833","UBERON:0001969","blood plasma","","","","","4002080252"
"","","","G62220SM","N-linked","protein_xref_pubmed","28905229","protein_xref_glygen_ds","GLY_000960","1.402411565","UBERON:0001969","blood plasma","","","","","4002080252"
"","","","G67579EM","N-linked","protein_xref_pubmed","28905229","protein_xref_glygen_ds","GLY_000960","0.865194439","UBERON:0001969","blood plasma","","","","","4002080252"
"","","","G67579EM","N-linked","protein_xref_pubmed","28905229","protein_xref_glygen_ds","GLY_000960","0.580548817","UBERON:0001969","blood plasma","","","","","4002080252"
"","","","G67579EM","N-linked","protein_xref_pubmed","28905229","protein_xref_glygen_ds","GLY_000960","4.130133093","UBERON:0001969","blood plasma","","","","","4002080252"
"","","","G78494LP","N-linked","protein_xref_pubmed","28905229","protein_xref_glygen_ds","GLY_000960","0.223745791","UBERON:0001969","blood plasma","","","","","4002080252"
"","","","G91519IS","N-linked","protein_xref_pubmed","28905229","protein_xref_glygen_ds","GLY_000960","0.076572836","UBERON:0001969","blood plasma","","","","","4002080252"
"","","","G73813HX","N-linked","protein_xref_pubmed","28905229","protein_xref_glygen_ds","GLY_000960","8.50362312","UBERON:0001969","blood plasma","","","","","4002080252"
"","","","G31213WS","N-linked","protein_xref_pubmed","28905229","protein_xref_glygen_ds","GLY_000960","0.84101438","UBERON:0001969","blood plasma","","","","","4002080252"
"","","","G02849EL","N-linked","protein_xref_pubmed","28905229","protein_xref_glygen_ds","GLY_000960","0.882493985","UBERON:0001969","blood plasma","","","","","4002080252"