glygener / glygen-issues

Repository for public GlyGen tickets
GNU General Public License v3.0
0 stars 0 forks source link

BCO creation for 2.5 #1309

Closed kmartinez834 closed 6 months ago

kmartinez834 commented 6 months ago
BCO ID Dataset Name Creator Name Notes
GLY_000965 chicken_proteoform_glycosylation_sites_uniprotkb.csv Urnisha
GLY_000966 chicken_protein_ncbi_linkouts.csv Urnisha
GLY_000967 chicken_protein_genelocus.csv Urnisha
GLY_000993 chicken_protein_info_uniprotkb.csv Urnisha
GLY_000994 chicken_protein_genenames_uniprotkb.csv Urnisha
GLY_000995 chicken_protein_function_refseq.csv Urnisha
GLY_000989 chicken_protein_citations_uniprotkb.csv Kate
GLY_000990 chicken_protein_altnames.csv Kate
GLY_001019 human_proteoform_citations_glycosylation_sites_embl.csv Karina
GLY_000996 chicken_protein_xref_oglcnac_atlas.csv Urnisha
GLY_000991 chicken_protein_xref_rhea.csv Kate
GLY_001024 chicken_protein_glycohydrolase.csv Jingyue
GLY_000979 chicken_proteoform_glycosylation_sites_literature.csv Jingyue
GLY_000997 chicken_protein_info_refseq.csv Urnisha
GLY_001021 mouse_proteoform_citations_glycosylation_sites_embl.csv Karina
GLY_000998 chicken_protein_masterlist.csv Urnisha
GLY_000960 human_proteoform_glycosylation_sites_diabetes_glycomic.csv Karina
GLY_000980 chicken_proteoform_glycosylation_sites_glyconnect.csv Jingyue
GLY_000982 chicken_proteoform_citations_phosphorylation_sites_uniprotkb.csv Jingyue
GLY_000992 chicken_protein_xref_brenda.csv Kate
GLY_001001 chicken_protein_xref_bgee.csv Kate
GLY_000889 mouse_proteoform_glycosylation_sites_embl.csv Karina
GLY_001002 chicken_protein_proteinnames_refseq.csv Kate
GLY_000983 chicken_protein_xref_refseq.csv Jingyue
GLY_000984 chicken_protein_transcriptlocus.csv Jingyue
GLY_000985 chicken_protein_xref_kegg.csv Jingyue
GLY_000986 chicken_protein_go_annotation.csv Jingyue
GLY_001003 chicken_proteoform_citations_glycation_sites_uniprotkb.csv Kate
GLY_001027 human_proteoform_citations_glycosylation_sites_pdc_ccrc.csv Karina
GLY_000987 chicken_protein_participants_reactome.csv Jingyue
GLY_001008 chicken_protein_submittednames.csv Jingyue
GLY_001009 chicken_proteoform_glycosylation_sites_pdb.csv Jingyue
GLY_001006 chicken_protein_xref_intact.csv Kate
GLY_001012 chicken_protein_xref_cdd.csv Jingyue I didn't find the template on data.glygen.org
GLY_001010 chicken_protein_citations_refseq.csv Jingyue
GLY_001013 chicken_protein_recnames.csv Jingyue
GLY_001015 chicken_proteoform_citations_glycosylation_sites_uniprotkb.csv Jingyue
GLY_001022 chicken_protein_reactions_reactome.csv Jingyue
GLY_001020 chicken_protein_xref_geneid.csv Jingyue
GLY_001023 chicken_proteoform_phosphorylation_sites_uniprotkb.csv Jingyue
GLY_000888 human_proteoform_glycosylation_sites_embl.csv Karina
GLY_001025 chicken_protein_reactions_rhea.csv Jingyue
GLY_000961 human_proteoform_glycosylation_sites_pdc_ccrc.csv Karina
GLY_000978 chicken_protein_xref_orthodb.csv Luke
GLY_000977 chicken_protein_pro_annotation.csv Luke
chicken_protein_signalp_peptidesequences.fasta Not required. part of chicken_protein_signalp_annotation.csv
GLY_000976 chicken_proteoform_glycosylation_sites_literature_mining.csv Luke
GLY_001007 chicken_protein_site_annotation_uniprotkb.csv Kate
GLY_000975 chicken_protein_binary_interactions.csv Luke
GLY_000974 chicken_protein_xref_pro.csv Luke
GLY_000973 chicken_protein_xref_glyconnect.csv Luke
GLY_000972 chicken_protein_glycosylation_motifs.csv Luke
GLY_000971 chicken_protein_canonicalsequences.fasta Luke
GLY_001016 human_proteoform_citations_glycosylation_sites_diabetes_glycomic.csv Karina
chicken_protein_signalp_fullsequences.fasta not required. Part of chicken_protein_signalp_annotation.csv
GLY_000970 chicken_protein_xref_oma.csv Luke
chicken_protein_signalp_cleavedsequences.fasta Not required. Part of chicken_protein_signalp_annotation.csv
GLY_000969 chicken_protein_xref_panther.csv Luke
GLY_001011 chicken_proteoform_citations_glycosylation_sites_oglcnac_mcw.csv Kate
GLY_000968 chicken_protein_pathways_reactome.csv Luke
GLY_001014 chicken_protein_signalp_annotation.csv Kate
GLY_001026 chicken_proteoform_citations_phosphorylation_sites_iptmnet.csv kate
GLY_001028 chicken_protein_citations_reactome.csv Cyrus
GLY_001029 chicken_protein_enzyme_annotation_uniprotkb.csv Cyrus
GLY_001030 chicken_protein_ptm_annotation_uniprotkb.csv Cyrus
GLY_001031 chicken_protein_xref_pfam.csv Cyrus
GLY_001032 chicken_proteoform_citations_glycosylation_sites_oglcnac_atlas.csv Cyrus
GLY_001033 chicken_proteoform_phosphorylation_sites_iptmnet.csv Cyrus
GLY_001034 chicken_protein_ntdata.nt Cyrus
GLY_001035 chicken_protein_function_uniprotkb.csv Cyrus
GLY_001036 chicken_proteoform_glycosylation_sites_literature_mining_manually_verified.csv Cyrus
GLY_001018 chicken_protein_genenames_refseq.csv Kate
GLY_001037 chicken_protein_xref_uniprotkb.csv Cyrus
GLY_001038 chicken_protein_participants_rhea.csv Cyrus
GLY_001039 chicken_protein_sequenceinfo.csv Cyrus
GLY_000999 chicken_protein_glycosyltransferase.csv Urnisha
GLY_001000 chicken_protein_xref_pdb.csv Urnisha
GLY_001004 chicken_protein_xref_interpro.csv Urnisha
GLY_001005 chicken_protein_allsequences.fasta Urnisha
GLY_001040 chicken_protein_xref_cazy.csv Cyrus
GLY_001041 chicken_protein_xref_reactome.csv Cyrus
GLY_001042 chicken_protein_xref_chembl.csv Cyrus
GLY_001045 human_proteoform_ml_ready_diabetes_glycomic.csv Karina
GLY_001046 human_proteoform_ml_ready_pdc_ccrc.csv Karina
GLY_001044 yeast_protein_pathways_reactome.csv Karina
GLY_001043 yeast_protein_xref_reactome.csv Karina
GLY_001047 rat_protein_matrixdb.csv Karina
GLY_001048 rat_protein_citations_matrixdb.csv Karina
kmartinez834 commented 6 months ago

Please claim a chunk of these BCOs to create by end of next week (5/17)

katewarner commented 6 months ago

Perhaps it would be good to add another column to the table, where we can add our names to the BCOs were working on?

kmartinez834 commented 6 months ago

@katewarner you can use the "Creator name" column for those you'll work on, thanks!

katewarner commented 6 months ago

@kmartinez834 Sorry, I could only see the first two columns in the ticket - I'm an idiot :-D

jeet-vora commented 6 months ago

Please claim atleast 12 BCOs by Monday 05/13 or they will be automatically assigned to you.

JingyueWu commented 6 months ago

@jeet-vora I accidentally created Chicken Glycosylation Sites (GlyConnect) BCO twice, can you please delete https://biocomputeobject.org/GLY_000981/DRAFT?

jeet-vora commented 6 months ago

@JingyueWu I do not have permission to delete it. @tiwa1154 has done it.

kmartinez834 commented 6 months ago

Make sure BCOs are in line with #1091

kmartinez834 commented 6 months ago

@JingyueWu @CyrusAY can you take a look at your BCOs below? The following are resulting in errors:

$ python3 /software/glygen/check-bco2filename-mapping.py | grep ERROR
NO-BCO,chicken_protein_glycohydrolase.csv,ERROR,in_fs
NO-BCO,chicken_proteoform_glycosylation_sites_literature.csv,ERROR,in_fs
NO-BCO,chicken_protein_submittednames.csv,ERROR,in_fs
NO-BCO,chicken_proteoform_glycosylation_sites_pdb.csv,ERROR,in_fs
NO-BCO,chicken_protein_xref_cdd.csv,ERROR,in_fs
NO-BCO,chicken_protein_citations_refseq.csv,ERROR,in_fs
NO-BCO,chicken_protein_recnames.csv,ERROR,in_fs
NO-BCO,chicken_proteoform_citations_glycosylation_sites_uniprotkb.csv,ERROR,in_fs
NO-BCO,chicken_protein_reactions_reactome.csv,ERROR,in_fs
NO-BCO,chicken_protein_xref_geneid.csv,ERROR,in_fs
NO-BCO,chicken_proteoform_phosphorylation_sites_uniprotkb.csv,ERROR,in_fs
NO-BCO,chicken_protein_reactions_rhea.csv,ERROR,in_fs
NO-BCO,chicken_proteoform_citations_glycosylation_sites_oglcnac_atlas.csv,ERROR,in_fs
NO-BCO,chicken_proteoform_glycosylation_sites_literature_mining_manually_verified.csv,ERROR,in_fs
kmartinez834 commented 6 months ago

@rykahsay we need to fix a few BCOS (above), and I'm adding more details to my ML ready BCOs - do you mind waiting til EOB Monday to to create the BCO objects?

CyrusAY commented 6 months ago

Last two of the BCO that showed error were mine.

Fixed chicken_proteoform_citations_glycosylation_sites_oglcnac_atlas.csv

Updating the IO domain of chicken_proteoform_glycosylation_sites_literature_mining_manually_verified.csv (GLY_001036). Which of the following datasets should I include as the input?

image

Update: fixed

kmartinez834 commented 6 months ago

@CyrusAY you can actually leave the input domain blank (or as-is). Robel will override it programmatically. Can you verify that the output domain is correct? Looks like it's currently human instead of chicken

CyrusAY commented 6 months ago

GLY_000979 | chicken_proteoform_glycosylation_sites_literature.csv should not be included in the chicken dataset. It is based off literatures specific to human only.

Usability domain reads:

The dataset provides information on N-glycosylation sites on Chicken proteins. The data has been processed from the supplementary material from 2 publications (1. "Deeb, S. J., Cox, J., Schmidt-Supprian, M., & Mann, M. (2013). N-linked Glycosylation Enrichment for In-depth Cell Surface Proteomics of Diffuse Large B-cell Lymphoma Subtypes. Molecular & Cellular Proteomics, 13(1), 240-251. doi:10.1074/mcp.m113.033977" 2. "Boersema, P. J., Geiger, T., Winiewski, J. R., & Mann, M. (2012). Quantification of the N-glycosylated Secretome by Super-SILAC During Breast Cancer Progression and in Chicken Blood Samples. Molecular & Cellular Proteomics, 12(1), 158-171. doi:10.1074/mcp.m112.023614"). The listed proteins (UniProtKB) accessions are part of the GlyGen UniProtKB canonical list (https://data.glygen.org/GLYDS000001).

Both literatures (the keyword 'human' is automatically changed to 'chicken' in the latter) were focusing on cancer patients.

On data.glygen.org there is another sarscov1 dataset with empty desc domain, might worth checking:

GLY_000612 SARS-CoV1 Glycosylation Sites (Literature) sarscov1_proteoform_glycosylation_sites_literature.csv

CyrusAY commented 6 months ago
$ python3 check-bco2filename-mapping.py | grep ERROR
NO-BCO,chicken_proteoform_glycosylation_sites_literature.csv,ERROR,in_fs
NO-BCO,rat_protein_citations_matrixdb.csv,ERROR,in_fs
NO-BCO,chicken_proteoform_glycosylation_sites_unicarbkb.csv,ERROR,in_fs
NO-BCO,rat_protein_matrixdb.csv,ERROR,in_fs
kmartinez834 commented 6 months ago

I have reviewed all of the new BCOs and created BCOs for rat_protein_matrixdb.csv and rat_protein_citations_matrixdb.csv. The remaining errors (chicken_proteoform_glycosylation_sites_literature.csv and chicken_proteoform_glycosylation_sites_unicarbkb.csv) are due to the presence of these files in unreviewed/ however the files are empty and were removed from the dataset-masterlist.json file.