glygener / glygen-issues

Repository for public GlyGen tickets
GNU General Public License v3.0
0 stars 0 forks source link

QC datasets #1500

Closed ReneRanzinger closed 1 month ago

ReneRanzinger commented 2 months ago

QC of all datasets generated by Robel

Dependencies:

Blocker for:

ubhuiyan commented 1 month ago

QC Check Assignments

File Name Assignment Checked Notes
arabidopsis_proteoform_citations_glycation_sites_uniprotkb.csv Cyrus Y no issues; is empty
arabidopsis_proteoform_glycosylation_sites_glyconnect.csv Cyrus Y no issues
arabidopsis_proteoform_citations_glycosylation_sites_glyconnect.csv Cyrus Y no issues
arabidopsis_proteoform_glycosylation_sites_oglcnac_atlas.csv Cyrus Y no issues
arabidopsis_proteoform_citations_glycosylation_sites_oglcnac_atlas.csv Cyrus Y no issues
arabidopsis_proteoform_glycosylation_sites_oglcnac_mcw.csv Cyrus Y no issues
arabidopsis_proteoform_citations_glycosylation_sites_oglcnac_mcw.csv Cyrus Y no issues
arabidopsis_proteoform_glycosylation_sites_pdb.csv Cyrus Y no issues
arabidopsis_proteoform_citations_glycosylation_sites_uniprotkb.csv Urnisha Y No Issues
arabidopsis_proteoform_glycosylation_sites_uniprotkb.csv Urnisha Y No Issues
arabidopsis_proteoform_phosphorylation_sites_iptmnet.csv Urnisha Y No Issues
arabidopsis_proteoform_citations_phosphorylation_sites_iptmnet.csv Urnisha Y No Issues
arabidopsis_proteoform_citations_phosphorylation_sites_uniprotkb.csv Urnisha Y No Issues
arabidopsis_proteoform_phosphorylation_sites_uniprotkb.csv Urnisha Y No Issues
arabidopsis_protein_altnames.csv Luke Y
arabidopsis_protein_binary_interactions.csv Luke Y
arabidopsis_protein_citations_uniprotkb.csv Luke Y
arabidopsis_protein_enzyme_annotation_uniprotkb.csv Luke Y
arabidopsis_protein_function_uniprotkb.csv Luke Y
arabidopsis_protein_genelocus.csv Luke Y
arabidopsis_protein_genenames_refseq.csv Luke Y
arabidopsis_protein_genenames_uniprotkb.csv Luke Y
arabidopsis_protein_glycohydrolase.csv Kate Y No issues
arabidopsis_protein_glycosylation_motifs.csv Luke Y
arabidopsis_protein_glycosyltransferase.csv Kate Y No issues
arabidopsis_protein_go_annotation.csv Luke Y eco_id and pmid appear to be blank for every entry.
arabidopsis_protein_info_refseq.csv Luke Y refseq_protein_summary mostly empty.
arabidopsis_protein_info_uniprotkb.csv Luke Y
arabidopsis_protein_masterlist.csv Luke Y
arabidopsis_protein_ncbi_linkouts.csv Luke Y
arabidopsis_protein_participants_rhea.csv Luke Y
arabidopsis_protein_pro_annotation.csv Luke Y only 2 entries. Not sure how to see if this is correct but this seems low.
arabidopsis_protein_proteinnames_refseq.csv Luke Y
arabidopsis_protein_ptm_annotation_uniprotkb.csv Luke Y
arabidopsis_protein_reactions_rhea.csv Luke Y
arabidopsis_protein_recnames.csv Luke Y
arabidopsis_protein_sequenceinfo.csv Luke Y
arabidopsis_protein_signalp_annotation.csv Luke Y
arabidopsis_protein_site_annotation_uniprotkb.csv Kate Y No issues
arabidopsis_protein_submittednames.csv Kate Y No issues
arabidopsis_protein_transcriptlocus.csv Kate Y No issues
arabidopsis_protein_xref_brenda.csv Luke Y
arabidopsis_protein_xref_cazy.csv Luke Y
arabidopsis_protein_xref_cdd.csv Luke Y
arabidopsis_protein_xref_chembl.csv Luke Y Only 30 entries.
arabidopsis_protein_xref_geneid.csv Luke Y
arabidopsis_protein_xref_glyconnect.csv Luke Y Only 2 entries
arabidopsis_protein_xref_intact.csv Luke Y
arabidopsis_protein_xref_interpro.csv Luke Y
arabidopsis_protein_xref_kegg.csv Kate Y No issues
arabidopsis_protein_xref_oglcnac_atlas.csv Kate Y No issues
arabidopsis_protein_xref_oglcnac_mcw.csv Kate Y No issues
arabidopsis_protein_xref_oma.csv Kate Y No issues
arabidopsis_protein_xref_orthodb.csv Kate Y No issues
arabidopsis_protein_xref_panther.csv Kate Y No issues
arabidopsis_protein_xref_pdb.csv Kate Y No issues
arabidopsis_protein_xref_pfam.csv Kate Y No issues
arabidopsis_protein_xref_pro.csv Kate Y No issues
arabidopsis_protein_xref_rhea.csv Kate Y No issues
arabidopsis_protein_xref_uniprotkb.csv Kate Y No issues
arabidopsis_protein_xref_pride.csv Kate Y No issues
human_protein_xref_pride.csv Kate Y No issues
yeast_protein_xref_pride.csv Kate Y No issues
pig_protein_xref_pride.csv Kate Y No issues
mouse_protein_xref_pride.csv Kate Y No issues
rat_protein_xref_pride.csv Kate Y No issues
fruitfly_protein_xref_pride.csv Kate Y No issues
human_protein_xref_massive.csv Kate Y No issues
human_protein_disease_alliance_genome.csv Kate Y No issues
ubhuiyan commented 1 month ago

@CyrusAY Please perform data QC on the following files I've assigned you. Let me know if you have any questions!

katewarner commented 1 month ago

@Luke-Johnson-5 I've added the new protein datasets that need to be checked and assigned you to half of them. I already checked the old datasets last week.

katewarner commented 1 month ago

@rykahsay Finished Dataset QC and reported the issues I found in the following tickets:

1571

1585

1582

1583 - I assigned this one to Preethi because the UniProt download needs to be checked

ubhuiyan commented 1 month ago

@rykahsay Finished Dataset QC and reported issues I found in the following tickets:

1580

No issues found regarding the glycan datasets - all numbers are positive. Row count decline in oglcnac_atlas datasets, but this is due to the xref key badge changes.

rykahsay commented 1 month ago

I am closing this --- it is done