Closed kmartinez834 closed 1 year ago
@rykahsay -->
cellline_mapping.csv
and tissue_mapping.csv
sample_mapping.csv
as this contains all cell line and tissue termsList of files currently using all three misc files:
*_proteoform_glycosylation_sites_glyconnect.csv
*_proteoform_glycosylation_sites_oglcnac_atlas.csv
*_proteoform_glycosylation_sites_oglcnac_mcw.csv
*_proteoform_glycosylation_sites_uniprotkb.csv
*_proteoform_glycosylation_sites_literature.csv
*_proteoform_glycosylation_sites_gptwiki.csv
*_proteoform_glycosylation_sites_harvard.csv
*_proteoform_glycosylation_sites_literature_mining.csv
*_proteoform_glycosylation_sites_literature_mining_manually_verified.csv
*_proteoform_glycosylation_sites_o_gluc.csv
*_proteoform_glycosylation_sites_pdb.csv
*_proteoform_glycosylation_sites_tyr_o_linked.csv
*_proteoform_glycosylation_sites_unicarbkb.csv
*_proteoform_glycosylation_sites_unicarbkb_glycomics_study.csv
done, please check if this change introduced any problem to these datasets
In all files with cell line data, format was changed from underscore to colon (ex. "CVCL_Z425" to "CVCL:Z425")
$ grep "CVCL" ../unreviewed/fruitfly_proteoform_glycosylation_sites_glyconnect.csv
"","","","G29931IJ","O-linked","protein_xref_glyconnect","415","protein_xref_glyconnect","415","415","7227","Drosophila melanogaster","1947","G00031MO","Core 1","O-Linked","1,1,0,0,0,0,0,0,0,0,0,0,0,0","H1N1","HexNAc(1)Hex(1)","383.1428","383.3527","Hex:1 HexNAc:1","G29931IJ","mucosa","UBERON:0000344","67j25D","CVCL:Z425","","","","","","","","","",""
--> Should be like "CVCL_Z425"
Check now
👍
For next release...
Consolidate misc files used for tissue/cell line/disease mapping. We are currently using all of the following:
cellline_mapping.csv sample_mapping.csv tissue_mapping.csv doid2uberonid_mapping_v5.csv