Closed sujeetvkulkarni closed 8 months ago
Also,
API : https://api.tst.glygen.org/pagination/page/
For table expression_tissue, If we sort by keys "start_pos", "uniprot_canonical_ac", tissue -> "namespace", tissue -> "id", we get empty "results": [] array.
{
"record_type": "glycan",
"table_id": "expression_tissue",
"record_id": "G17689DH",
"offset": 1,
"limit": 20,
"order": "desc",
"sort": "uniprot_canonical_ac"
}
{
"query": {
"record_type": "glycan",
"table_id": "expression_tissue",
"record_id": "G17689DH",
"offset": 1,
"limit": 20,
"order": "desc",
"sort": "uniprot_canonical_ac"
},
"results": []
}
For table expression_cell_line, If we sort by keys "start_pos", "uniprot_canonical_ac", cell_line -> "namespace", cell_line -> "id", we get empty "results": [] array.
{
"record_type": "glycan",
"table_id": "expression_cell_line",
"record_id": "G17689DH",
"offset": 1,
"limit": 20,
"order": "desc",
"sort": "uniprot_canonical_ac"
}
{
"query": {
"record_type": "glycan",
"table_id": "expression_cell_line",
"record_id": "G17689DH",
"offset": 1,
"limit": 20,
"order": "desc",
"sort": "uniprot_canonical_ac"
},
"results": []
}
Backend need to send results even in case a sort column has no values in it and user sorts it.
Please try now
API: https://api.tst.glygen.org/glycan/detail/G80966KZ
For table id = expression_tissue sort fields : "start_pos", "uniprot_canonical_ac" are working. Can you please tell me what sort keys to use for tissue -> "namespace" and tissue -> "id"
expression: [
...
{
"uniprot_canonical_ac": "P01024-1",
"start_pos": 85,
"end_pos": 85,
"residue": "Asn",
"category": "tissue",
"tissue": {
"name": "milk",
"namespace": "UBERON",
"id": "0001913",
"url": "http://purl.obolibrary.org/obo/UBERON_0001913"
},
"evidence": [
{
"id": "2039",
"database": "GlyConnect",
"url": "https://glyconnect.expasy.org/browser/structures/2039"
},
{
"id": "110",
"database": "GlyConnect",
"url": "https://glyconnect.expasy.org/browser/proteins/110"
},
{
"id": "32125861",
"database": "PubMed",
"url": "https://glygen.org/publication/PubMed/32125861"
}
]
}
...
]
Can you please also give a glycan id where cell_line information is present( "category": "cell_line" in expression: [] array and let us know what sort keys to use for cell_line -> "namespace" and cell_line -> "id".
Also, backend is sending expression_tissue:[] array in https://api.tst.glygen.org/glycan/detail/G80966KZ API which frontend is not using.
I couldn't find any glycan with expression in cell_line -- @kmartinez834 can you please verify
Again getting (table id = expression_tissue) empty result array for pagination API.
API: https://api.tst.glygen.org/glycan/detail/G80966KZ API: https://api.tst.glygen.org//pagination/page/
For table id = expression_tissue sort fields : "start_pos", "uniprot_canonical_ac", "namespace' are returning empty results: [] array.
Looks like the some of the data isn't making it to the API...
The following proteoform datasets have cell line information associated with glycan ac's:
/data/projects/glygen/generated/datasets/unreviewed/mouse_proteoform_glycosylation_sites_oglcnac_atlas.csv
/data/projects/glygen/generated/datasets/unreviewed/hcv1a_proteoform_glycosylation_sites_literature.csv
/data/projects/glygen/generated/datasets/unreviewed/rat_proteoform_glycosylation_sites_oglcnac_atlas.csv
/data/projects/glygen/generated/datasets/unreviewed/human_proteoform_glycosylation_sites_oglcnac_atlas.csv
/data/projects/glygen/generated/datasets/unreviewed/rat_proteoform_glycosylation_sites_glyconnect.csv
/data/projects/glygen/generated/datasets/unreviewed/mouse_proteoform_glycosylation_sites_glyconnect.csv
/data/projects/glygen/generated/datasets/unreviewed/sarscov1_proteoform_glycosylation_sites_literature.csv
/data/projects/glygen/generated/datasets/unreviewed/fruitfly_proteoform_glycosylation_sites_oglcnac_atlas.csv
/data/projects/glygen/generated/datasets/unreviewed/sarscov2_proteoform_glycosylation_sites_glyconnect.csv
/data/projects/glygen/generated/datasets/unreviewed/sarscov2_proteoform_glycosylation_sites_unicarbkb.csv
/data/projects/glygen/generated/datasets/unreviewed/human_proteoform_glycosylation_sites_glyconnect.csv
/data/projects/glygen/generated/datasets/unreviewed/fruitfly_proteoform_glycosylation_sites_glyconnect.csv
/data/projects/glygen/generated/datasets/unreviewed/human_proteoform_glycosylation_sites_unicarbkb.csv
And these have tissue information w/ glycan ac's:
/data/projects/glygen/generated/datasets/unreviewed/mouse_proteoform_glycosylation_sites_oglcnac_atlas.csv
/data/projects/glygen/generated/datasets/unreviewed/rat_proteoform_glycosylation_sites_oglcnac_atlas.csv
/data/projects/glygen/generated/datasets/unreviewed/human_proteoform_glycosylation_sites_oglcnac_atlas.csv
/data/projects/glygen/generated/datasets/unreviewed/rat_proteoform_glycosylation_sites_unicarbkb.csv
/data/projects/glygen/generated/datasets/unreviewed/human_proteoform_glycosylation_sites_unicarbkb_glycomics_study.csv
/data/projects/glygen/generated/datasets/unreviewed/rat_proteoform_glycosylation_sites_glyconnect.csv
/data/projects/glygen/generated/datasets/unreviewed/mouse_proteoform_glycosylation_sites_glyconnect.csv
/data/projects/glygen/generated/datasets/unreviewed/rat_proteoform_glycosylation_sites_unicarbkb_glycomics_study.csv
/data/projects/glygen/generated/datasets/unreviewed/fruitfly_proteoform_glycosylation_sites_oglcnac_atlas.csv
/data/projects/glygen/generated/datasets/unreviewed/sarscov2_proteoform_glycosylation_sites_unicarbkb.csv
/data/projects/glygen/generated/datasets/unreviewed/human_proteoform_glycosylation_sites_glyconnect.csv
/data/projects/glygen/generated/datasets/unreviewed/fruitfly_proteoform_glycosylation_sites_glyconnect.csv
/data/projects/glygen/generated/datasets/unreviewed/human_proteoform_glycosylation_sites_unicarbkb.csv
"uniprotkb_canonical_ac","glycosylation_site_uniprotkb","amino_acid","saccharide","glycosylation_type","xref_key","xref_id","src_xref_key","src_xref_id","uniprotkb_ac","evidence","genbank_accession_nucleotide_from_paper","genbank_accession_nucleotide_version_from_paper","genbank_accession_protein_version","protein_name_genbank","protein_rec_name_uniprot","tax_id_uniprotkb_ac","organism","strain_uniprotkb_ac","glycosylation_site_in_paper","link_sugar","glycan_composition_in_paper","glycan_composition_format_1","glycan_composition_format_2","core_type","glycoCT","oxford_notation","abundance_from_paper","glycopeptide_sequence","abundance_normalized","predominant_glycan_species","biological_source","source_cell_line_cellosaurus_name","source_cell_line_cellosaurus_id","analyte","mass_glycopeptide","chromatography_glycopeptide","analyzer_glycopeptide","sample_preparation_glycopeptide","glycosidase_treatment_glycopeptide","lectin_characterisation_glycopeptide","fragmentation_glycopeptide","ionization_glycopeptide","notes","entry_version_uniprot","entry_modification_date_refseq","n_sequon","n_sequon_type","start_pos","end_pos","start_aa","end_aa","site_seq"
"P27958-1","448","Asn","G92050GC","N-linked","protein_xref_pubmed","18187336","protein_xref_glygen_ds","GLY_000335","","18187336","AF009606","AF009606.1","AAB66324.1","polyprotein [Hepatitis C virus subtype 1a]","Genome polyprotein","63746","Hepacivirus C","Hepatitis C virus subtype 1a (Isolate H)","66","GlcNAc","Man4","Hex4HexNAc2","HexNAc2Hex4dHex0NeuAc0NeuGc0Pent0S0P0KDN0HexA0","High mannose","","M4","0.61","FNSSGCPER","14.59","high mannose glycans","","CHO","CVCL_0213","glycopeptide","2106.763","HPLC","Q-Tof hybrid","Carboxymethylation | tryptic digest","","","CAD","MALDI","The notation ManX, where X ranges from 4 to 9 in the case of the observed tryptic and chymotryptic glycopeptides, indicates that X mannose residues are attached to the chitobiose core (GlcNAc ?(1-4) GlcNAc) | the differences in the number of mannose residues is caused by the slight differences between the E2 glycoprotein batches that were used. The majority of these sites proved to be occupied by high mannose glycans. The relative abundance was determined from deconvoluted mass spectrum over the mass range containing the glycopeptide ions corresponding to glycopeptides with the same amino acid sequence. See glycopeptide sequence reported.","43509","39982","NSS","NXS","448","448","Asn","Asn","N"
https://api.tst.glygen.org/glycan/detail/G92050GC --> missing cell line (CVCL_0213)
"expression": [{
"uniprot_canonical_ac": "P01830-1",
"start_pos": 42,
"end_pos": 42,
"residue": "Asn",
"category": "tissue",
"tissue": {
"name": "Synaptosomes",
"namespace": "OMIT",
"id": "0014437",
"url": "http://purl.obolibrary.org/obo/OMIT_0014437"
},
"evidence": [{
"id": "P01830",
"database": "UniCarbKB"
}, {
"id": "34106099",
"database": "PubMed",
"url": "https://glygen.org/publication/PubMed/34106099"
}, {
"id": "10.1039/D0MO00044B",
"database": "DOI",
"url": "https://glygen.org/publication/DOI/10.1039/D0MO00044B"
}]
}, {
"uniprot_canonical_ac": "P13638-1",
"start_pos": 118,
"end_pos": 118,
"residue": "Asn",
"category": "tissue",
"tissue": {
"name": "Synaptosomes",
"namespace": "OMIT",
"id": "0014437",
"url": "http://purl.obolibrary.org/obo/OMIT_0014437"
},
"evidence": [{
"id": "P13638",
"database": "UniCarbKB"
}, {
"id": "34106099",
"database": "PubMed",
"url": "https://glygen.org/publication/PubMed/34106099"
}, {
"id": "10.1039/D0MO00044B",
"database": "DOI",
"url": "https://glygen.org/publication/DOI/10.1039/D0MO00044B"
}]
}, {
"uniprot_canonical_ac": "P45479-1",
"start_pos": 197,
"end_pos": 197,
"residue": "Asn",
"category": "tissue",
"tissue": {
"name": "Synaptosomes",
"namespace": "OMIT",
"id": "0014437",
"url": "http://purl.obolibrary.org/obo/OMIT_0014437"
},
"evidence": [{
"id": "P45479",
"database": "UniCarbKB"
}, {
"id": "34106099",
"database": "PubMed",
"url": "https://glygen.org/publication/PubMed/34106099"
}, {
"id": "10.1039/D0MO00044B",
"database": "DOI",
"url": "https://glygen.org/publication/DOI/10.1039/D0MO00044B"
}]
}],
"uniprotkb_canonical_ac","glycosylation_site_uniprotkb","amino_acid","saccharide","glycosylation_type","xref_key","xref_id","src_xref_key","src_xref_id","protein_id","taxonomy_taxonomy_id","taxonomy_species","structure_id","structure_glytoucan_id","structure_glycan_core","structure_glycan_type","composition_format_numeric","composition_format_condensed","composition_format_byonic","composition_mass_monoisotopic","composition_mass","composition_format_glyconnect","composition_glytoucan_id","source_tissue_name","source_tissue_id","source_cell_line_cellosaurus_name","source_cell_line_cellosaurus_id","source_cell_component_id","source_cell_component_go_id","source_cell_component_name","n_sequon","n_sequon_type","start_pos","end_pos","start_aa","end_aa","site_seq"
"","","","G11457RF","O-linked","protein_xref_glyconnect","357","protein_xref_glyconnect","357","357","10116","Rattus norvegicus","2376","G11457RF","Core 2","O-Linked","2,2,0,1,0,0,0,0,0,0,0,0,0,0","H2N2S1","HexNAc(2)Hex(2)NeuAc(1)","1039.3704","1039.948","Hex:2 HexNAc:2 NeuAc:1","","liver","UBERON:0002107","Zajdela-Hepatoma","CVCL_1D00","123","GO_0009986","Cell Surface","","","","","","",""
https://api.tst.glygen.org/glycan/detail/G11457RF --> missing cell line (CVCL_1D00) and tissue (UBERON:0002107)
{
"table_id": "expression",
"table_stats": [{
"field": "total",
"count": 0
}, {
"field": "total_sites",
"count": 0
}]
}, {
"table_id": "expression_tissue",
"table_stats": [{
"field": "total",
"count": 0
}, {
"field": "total_sites",
"count": 0
}]
}, {
"table_id": "expression_cell_line",
"table_stats": [{
"field": "total",
"count": 0
}
The following work now:
{
"record_type": "glycan",
"table_id": "expression_cell_line",
"record_id": "G92050GC",
"offset": 1,
"limit": 20,
"order": "desc",
"sort": "uniprot_canonical_ac"
}
{
"record_type": "glycan",
"table_id": "expression_tissue",
"record_id": "G92050GC",
"offset": 1,
"limit": 20,
"order": "desc",
"sort": "uniprot_canonical_ac"
}
Pagination looks good
@rykahsay @ReneRanzinger just to confirm, are we intentionally omitting glycan expression records that don't have a known protein and/or site?
@rykahsay It is working as expected. sort fields in section_stats->sort_fields for expression_tissue table starts with cell_line. which should be tissue. But tissue.* works fine only the names in section_stats->sort_fields for expression_tissue table need change.
Pagination looks good
@rykahsay @ReneRanzinger just to confirm, are we intentionally omitting glycan expression records that don't have a known protein and/or site?
@kmartinez834 can you please give us an example of what data is getting filtered out? Is it backend or frontend filtering the data? Is the data with no cell_line or tissue info getting filtered out?
@sujeetvkulkarni @rykahsay backend filtering --> there are entries without protein/site in the datasets that are not appearing in the API:
$ grep "G57321FI.*CVCL" reviewed/fruitfly_proteoform_glycosylation_sites_glyconnect.csv
"","","","G57321FI","O-linked","protein_xref_glyconnect","416","protein_xref_glyconnect","416","416","7227","Drosophila melanogaster","2305","G57321FI","Core 0","O-Linked","0,1,0,0,0,0,0,0,0,0,0,0,0,0","N1","HexNAc(1)","221.09","221.2103","HexNAc:1","","mucosa","UBERON:0000344","67j25D","CVCL_Z425","","","","","","","","","",""
--> Also, these known sites have cell_line CVCL_6642 (HEK293-F), but are not included in the API:
reviewed/sarscov2_proteoform_glycosylation_sites_glyconnect.csv:
"P0DTC2-1","1076","Thr","G57321FI"
"P0DTC2-1","1077","Thr","G57321FI"
"P0DTC2-1","1097","Ser","G57321FI"
"P0DTC2-1","73","Thr","G57321FI"
"P0DTC2-1","76","Thr","G57321FI"
"P0DTC2-1","803","Ser","G57321FI"
check now
Known sites are now included.
Entry without protein_ac is still missing --> should this be included?
@rykahsay It is working as expected. sort fields in section_stats->sort_fields for expression_tissue table starts with cell_line. which should be tissue. But tissue.* works fine only the names in section_stats->sort_fields for expression_tissue table need change.
@rykahsay this problem still exists both on beta and tst. https://beta-api.glygen.org/glycan/detail/G92050GC https://api.tst.glygen.org/glycan/detail/G92050GC
{
"table_id": "expression_tissue",
"table_stats": [
{
"field": "total",
"count": 3
},
{
"field": "total_sites",
"count": 3
}
],
"sort_fields": [
"uniprot_canonical_ac",
"start_pos",
"end_pos",
"residue",
"category",
"cell_line.name",
"cell_line.namespace",
"cell_line.id",
"cell_line.url",
"abundance"
]
},
{
"table_id": "expression_cell_line",
"table_stats": [
{
"field": "total",
"count": 59
},
{
"field": "total_sites",
"count": 59
}
],
"sort_fields": [
"uniprot_canonical_ac",
"start_pos",
"end_pos",
"residue",
"category",
"cell_line.name",
"cell_line.namespace",
"cell_line.id",
"cell_line.url",
"abundance"
]
},
@sujeetvkulkarni check if protein and position column can be empty in glycan detail #expression and publication detail #expression sections
@sujeetvkulkarni check if protein and position column can be empty in glycan detail #expression and publication detail #expression sections
2ba0db5effe334798a2b0dfa4b8d999af39938c2 - done.
@rykahsay you can go ahead and do your changes.
Please check
G92050GC
@sujeetvkulkarni --> glycan expression records with empty protein and site is not working on publication page:
https://api.tst.glygen.org/publication/detail/
"glycan_expression": [
{
"glytoucan_ac": "G57321FI",
"tissue": {
"name": "embryo",
"namespace": "UBERON",
"id": "0000922",
"url": "http://purl.obolibrary.org/obo/UBERON_0000922"
}
}
https://tst.glygen.org/publication/PubMed/16897177#Expression done, please check.
52f5d24faa9849e9dfec0117670e12954063d631
Looks good
need to define sort keys for columns of expression_tissue and expression_cell_line table. Both glycan details and table pagination api should support these sort keys.