glygener / glygen-issues

Repository for public GlyGen tickets
GNU General Public License v3.0
0 stars 0 forks source link

Check glycan dataset | 2.5 #1297

Closed CyrusAY closed 1 week ago

CyrusAY commented 2 weeks ago

No issue to be reported.

Summary: cyruschauyeung_qc_check_glycan_datasets.csv |id_count_diff|id_count_new|id_count_old|row_count_diff|row_count_new|row_count_old|field_count_diff|field_count_new|field_count_old|dataset_file_name|status_flags| |---|---|---|---|---|---|---|---|---|---|---| |0|364|364|0|364|364|0|2|2|glycan_glytoucan_accession_history.csv|old_dataset| |0|116|116|0|277|277|0|9|9|glycan_citations_motif.csv|old_dataset| |0|751|751|0|795|795|0|3|3|glycan_xref_unicarbkb.csv|old_dataset| |0|603|603|0|8116|8116|0|3|3|glycan_xref_pdb.csv|old_dataset| |0|7108|7108|0|11276|11276|0|3|3|glycan_xref_glycosciencesde.csv|old_dataset| |0|3194|3194|0|5239|5239|0|3|3|glycan_xref_cfg.csv|old_dataset| |0|5879|5879|0|29839|29839|0|3|3|glycan_xref_carbbank.csv|old_dataset| |0|97|97|0|198|198|0|6|6|glycan_synthesized.csv|old_dataset| |0|38|38|0|38|38|0|5|5|glycan_ncfg.csv|old_dataset| |0|38|38|0|38|38|0|9|9|glycan_citations_ncfg.csv|old_dataset| |0|63|63|0|347|347|0|3|3|glycan_xref_rhea.csv|old_dataset| |0|32|32|0|61|61|0|3|3|glycan_xref_reactome.csv|old_dataset| |0|101|101|0|101|101|0|3|3|glycan_xref_gptwiki.csv|old_dataset| |0|32|32|0|843|843|0|7|7|glycan_pathway_reactome.csv|old_dataset| |0|126|126|0|138|138|0|3|3|glycan_xref_dictionary.csv|old_dataset| |0|128|128|0|140|140|0|13|13|glycan_dictionary.csv|old_dataset| |1|142|141|1|145|144|0|3|3|glycan_xref_glycoepitope.csv|old_dataset; rowcount_change; idcount_change| |2|678|676|3|807|804|0|3|3|glycan_xref_unicarbdb.csv|old_dataset; rowcount_change; idcount_change| |2|342|340|9|518|509|0|3|3|glycan_xref_bcsdb.csv|old_dataset; rowcount_change; idcount_change| |3|79|76|-38|84|122|0|9|9|glycan_citations_biomarkers.csv|old_dataset; rowcount_change; idcount_change| |3|12|9|3|12|9|0|3|3|glycan_xref_matrixdb.csv|old_dataset; rowcount_change; idcount_change| |3|79|76|-17|188|205|10|27|17|glycan_biomarkers.csv|old_dataset; structure_change; rowcount_change; idcount_change| |7|1083|1076|7|1083|1076|0|3|3|glycan_xref_gadr.csv|old_dataset; rowcount_change; idcount_change| |17|369|352|17|369|352|0|2|2|glycan_type_n_linked_byonic.csv|old_dataset; rowcount_change; idcount_change| |21|9858|9837|21|9870|9849|0|3|3|glycan_xref_chebi.csv|old_dataset; rowcount_change; idcount_change| |46|4632|4586|46|4746|4700|0|3|3|glycan_xref_kegg.csv|old_dataset; rowcount_change; idcount_change| |70|4801|4731|82|5011|4929|0|3|3|glycan_xref_glyconnect.csv|old_dataset; rowcount_change; idcount_change| |110|6427|6317|110|6427|6317|0|2|2|glycan_sequences_glycam_iupac.csv|old_dataset; rowcount_change; idcount_change| |112|9602|9490|224|19204|18980|0|3|3|glycan_xref_pubchem.csv|old_dataset; rowcount_change; idcount_change| |112|9602|9490|112|9602|9490|0|3|3|glycan_sequences_smiles_isomeric.csv|old_dataset; rowcount_change; idcount_change| |112|9602|9490|112|9602|9490|0|3|3|glycan_sequences_inchi.csv|old_dataset; rowcount_change; idcount_change| |136|8531|8395|136|8531|8395|0|1|1|glycan_fully_determined.csv|old_dataset; rowcount_change; idcount_change| |205|5366|5161|11926|136134|124208|0|7|7|glycan_pathway_glycotree.csv|old_dataset; rowcount_change; idcount_change| |273|3911|3638|420|17592|17172|0|9|9|glycan_citations_glytoucan.csv|old_dataset; rowcount_change; idcount_change| |-430|1828|2258|-42|5074|5116|0|3|3|glycan_names.csv|old_dataset; rowcount_change; idcount_change| |873|11306|10433|13458|64180|50722|0|10|10|glycan_species_customized_neuac_neugc.csv|old_dataset; rowcount_change; idcount_change| |2979|30529|27550|34305|163286|128981|0|10|10|glycan_species.csv|old_dataset; rowcount_change; idcount_change| |3377|28519|25142|14543|85549|71006|0|12|12|glycan_motif.csv|old_dataset; rowcount_change; idcount_change| |4132|37783|33651|9771|84353|74582|0|7|7|glycan_classification.csv|old_dataset; rowcount_change; idcount_change| |4325|40602|36277|4325|40602|36277|0|2|2|glycan_sequences_iupac_extended.csv|old_dataset; rowcount_change; idcount_change| |4858|43316|38458|4858|43316|38458|0|3|3|glycan_sequences_byonic.csv|old_dataset; rowcount_change; idcount_change| |5084|48635|43551|5084|48635|43551|0|2|2|glycan_sequences_glycoctxml.csv|old_dataset; rowcount_change; idcount_change| |5099|45692|40593|5099|45692|40593|0|2|2|glycan_sequences_gwb.csv|old_dataset; rowcount_change; idcount_change| |5308|10111|4803|5308|10111|4803|0|4|4|glycan_top_authors.csv|old_dataset; rowcount_change; idcount_change| |5807|53382|47575|5807|53382|47575|0|39|39|glycan_monosaccharide_composition_advanced.csv|old_dataset; rowcount_change; idcount_change| |5807|53383|47576|5807|53383|47576|0|2|2|glycan_sequences_glycoct.csv|old_dataset; rowcount_change; idcount_change| |5807|53383|47576|5807|53383|47576|0|2|2|glycan_sequences_wurcs.csv|old_dataset; rowcount_change; idcount_change| |5807|53383|47576|5807|53383|47576|0|2|2|glycan_glytoucan_linkout.csv|old_dataset; rowcount_change; idcount_change| |5808|53384|47576|5808|57508|51700|0|12|12|glycan_masterlist.csv|old_dataset; rowcount_change; idcount_change| |5808|53384|47576|5808|53384|47576|0|14|14|glycan_monosaccharide_composition.csv|old_dataset; rowcount_change; idcount_change| |5808|53384|47576|5808|53384|47576|0|3|3|glycan_xref_glytoucan.csv|old_dataset; rowcount_change; idcount_change| |5808|53384|47576|5808|53384|47576|0|4|4|glycan_pubchem_status.csv|old_dataset; rowcount_change; idcount_change| |5808|53384|47576|46464|427072|380608|0|3|3|glycan_toolsupport.csv|old_dataset; rowcount_change; idcount_change| |6113|53283|47170|227870|1488616|1260746|0|4|4|glycan_subsumption.csv|old_dataset; rowcount_change; idcount_change| |6113|53283|47170|6113|53283|47170|0|3|3|glycan_xref_gnome.csv|old_dataset; rowcount_change; idcount_change| |6117|53382|47265|6117|53382|47265|0|3|3|glycan_xref_glycosmos.csv|old_dataset; rowcount_change; idcount_change| |6572|53261|46689|19376|158120|138744|0|5|5|glycan_image_details.csv|old_dataset; rowcount_change; idcount_change| |8336|27456|19120|74496|810319|735823|1|13|12|glycan_enzyme.csv|old_dataset; structure_change; rowcount_change; idcount_change| |8336|27456|19120|8336|27456|19120|0|3|3|glycan_xref_sandbox.csv|old_dataset; rowcount_change; idcount_change| |23958|250901|226943|23958|250901|226943|0|3|3|glycan_glytoucanidlist.csv|old_dataset; rowcount_change; idcount_change|

Datasets manually checked:

  1. glycan_citations_biomarkers.csv
        "G00053MO"
        "G40099BA"
        "G96747WW"
  1. glycan_biomarkers.csv - checked

Fields added: "assessed_biomarker_entity_id" "best_biomarker_role" "biomarker_canonical_id" "component_level_tags" "component_synonyms" "condition" "do_desc" "do_syn" "evidence" "evidence_type" "exposure_agent" "exposure_agent_id" "top_level_tags"


- where "best_biomarker_type" is replaced by "best_biomarker_role"; "Literature_evidence" is replaced by "Evidence"; info in "notes" (e.g. GTC id) redistributed to other fields or redundancy removed. New glytoucan accessions were added but none deleted.

3. **glycan_names.csv** - checked
- id_count_diff: -430; row_count_diff: -42
- removal of redundant ```"glytoucan_ac"``` with the same ```"glycan"``` name

5. **glycan_enzyme.csv** - checked
- field_count_diff: +1
- added field ```"uniprot_ac"```
- New glytoucan accessions were added but none deleted.