glygener / glygen-issues

Repository for public GlyGen tickets
GNU General Public License v3.0
0 stars 0 forks source link

Too many rows commented out in xref_info.csv #926

Closed rykahsay closed 9 months ago

rykahsay commented 9 months ago

@kmartinez834 -- I remember you had a task to identify those URL templates resulted in broken links. I am not sure why these many of them are commented out. For example, I don't know why this row is commented.

protein_xref_uniprotkb_proteinname,UniProtKB,https://www.uniprot.org/uniprot/%s#names_and_taxonomy,

$ cat generated/misc/xref_info.csv | grep ^"#"

#protein_xref_uniprotkb_proteinname,UniProtKB,https://www.uniprot.org/uniprot/%s#names_and_taxonomy,
#protein_xref_entrez,Entrez,https://www.ncbi.nlm.nih.gov/gene/%s,
#protein_xref_hepatitisconline,HepatitisCOnline,https://www.hepatitisc.uw.edu/page/%s/biology,
#protein_xref_unicarbkb,UniCarbKB,http://unicarbkb.org/proteinsummary/%s/annotated,
#protein_xref_viruspathogenresource,ViPR,https://www.viprbrc.org/brc/home.spg?decorator=flavi_%s,
#glycan_xref_cfg,CFG,http://www.functionalglycomics.org/glycomics/CarbohydrateServlet?pageType=view&view=view&operationType=view&carbId=%s,carbNlink_30602_A|carbOlink_20077_D000|carbOlink_20078_D000
#glycan_xref_glycosciencesde,Glycosciences.de,http://www.glycosciences.de/database/start.php?action=explore_linucsid&linucsid=%s,
#glycan_xref_unicarbkb,UniCarbKB,http://www.unicarbkb.org/structure/%s,
#protein_xref_uberon,Uberon,http://purl.obolibrary.org/obo/UBERON_%s,
#protein_xref_micado,Micado,http://genome.jouy.inra.fr/micado/gene.cgi?species=Bacillus+subtilis&gene=%s,
#glycan_xref_inchi_key,InCHI Key,https://pubchem.ncbi.nlm.nih.gov/compound/%s,
#protein_xref_pirsf,PIRSF,http://pir.georgetown.edu/cgi-bin/ipcSF?id=%s,
#protein_xref_sfld,SFLD,http://sfld.rbvi.ucsf.edu/django/lookup/%s,
#protein_xref_complexportal,Complex Portal,https://www.ebi.ac.uk/complexportal/complex/%s,
#protein_xref_pdbj,PDBj,http://pdbj.org/mine/summary/%s,
#protein_xref_rouge,ROUGE,http://www.kazusa.or.jp/rouge/gfpage/%s,
#protein_xref_go,Gene Ontology,https://www.ebi.ac.uk/QuickGO/term/%s,
#protein_xref_ccds,CCDS,https://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi?REQUEST=CCDS&GO=MainBrowse&DATA=%s,
#protein_xref_bindingdb,BindingDB,http://www.bindingdb.org/uniprot/%s,
#protein_xref_swiss-2dpage,SWISS-2DPAGE,http://world-2dpage.expasy.org/swiss-2dpage/%s,
#protein_xref_pharmgkb,PharmGKB,https://www.pharmgkb.org/gene/%s,
#glycan_xref_subsumption,Subsumption,https://glygen.org/glycan/%s,
#glycan_xref_composition,Composition,https://glygen.org/glycan_detail.html?glytoucan_ac=%s,
#protein_xref_opentargets,Open Targets,https://www.targetvalidation.org/target/%s/associations,
#protein_xref_disgenet,DisGeNet,http://disgenet.org/search?q=%s,
#protein_xref_enzyme,ENZYME,https://enzyme.expasy.org/EC/%s,
#protein_xref_genbank,GenBank,https://www.ncbi.nlm.nih.gov/nuccore/%s,
#protein_xref_glycodb,GlycoDB,http://jcggdb.jp/rcmg/glycodb/%s,
#protein_xref_pride,PRIDE,https://www.ebi.ac.uk/pride/searchSummary.do?queryTypeSelected=identification accession number&identificationAccessionNumber=%s,
#protein_xref_smart,SMART,http://smart.embl.de/smart/do_annotation.pl?DOMAIN=%s,
#protein_xref_esther,ESTHER,http://bioweb.supagro.inra.fr/ESTHER/gene_locus?name=%s&class=Gene_locus,
#protein_xref_embl,EMBL,https://www.ebi.ac.uk/ena/data/view/%s,
#protein_xref_string,STRING,https://string-db.org/network/%s,
#protein_xref_genewiki,Gene Wiki,https://en.wikipedia.org/wiki/%s,
#glycan_xref_glycomedb,GlycomeDB,http://www.glycome-db.org/database/showStructure.action?glycomeId=%s,
#protein_xref_pdbsum,PDBsum,https://www.ebi.ac.uk/pdbsum/%s,
#protein_xref_biocyc,BioCyc,http://biocyc.org/getid?id=%s,
#protein_xref_rebase,REBASE,http://rebase.neb.com/rebase/enz/%s.html,
#protein_xref_biogrid,BioGRID,https://thebiogrid.org/%s,
#protein_xref_drugbank,DrugBank,https://www.drugbank.ca/drugs/%s,
#protein_xref_malacards,MalaCards,http://www.malacards.org/search/eliteGene/%s,
#protein_xref_dip,DIP,http://dip.doe-mbi.ucla.edu/dip/Browse.cgi?ID=%s,
#glycan_xref_glycan_type,Glycan Type,http://bioportal.bioontology.org/ontologies/GLYCO/?p=classes&conceptid=%s&jump_to_nav=true,
#glycan_xref_monosaccharide_residue_name,Monosaccharide Residue Name,https://pubchem.ncbi.nlm.nih.gov/compound/%s,
#protein_xref_unicarbkb_pub,UniCarbKB,http://www.unicarbkb.org/proteinsummary/%s/annotated,
#protein_xref_phosphositeplus,PhosphoSitePlus,https://www.phosphosite.org/uniprotAccAction?id=%s,
#protein_xref_lit_min,Automatic Text Mining,http://biotm.cis.udel.edu/glyco/pmid/%s,
#protein_xref_pmc,PubMed Central,https://www.ncbi.nlm.nih.gov/pmc/articles/%s/,
#protein_xref_phylomedb,PhylomeDB,http://phylomedb.org/?q=search_tree&seqid=%s,
#protein_xref_uniprotkb_pub,UniProtKB,https://www.uniprot.org/uniprot/%s/publications,
#protein_xref_proteomes,UniProtKB,https://www.uniprot.org/proteomes/%s,
#protein_xref_merops,MEROPS,https://www.ebi.ac.uk/merops/cgi-bin/pepsum?mid=%s,
#protein_xref_genedb,GeneDb,http://www.genedb.org/gene/%s,
#protein_xref_glyco,Glyco,https://glytoucan.org/Structures/Glycans/%s,
#protein_xref_ddbj,DDBJ,http://getentry.ddbj.nig.ac.jp/search/get_entry?accnumber=%s,
#protein_xref_genereviews,GeneReviews,https://www.ncbi.nlm.nih.gov/books/NBK1116/?term=%s,
#protein_xref_dictybase,dictyBase,http://dictybase.org/db/cgi-bin/gene_page.pl?primary_id=%s,
#protein_xref_uniprot_isoform,UniProtKB,https://www.uniprot.org/uniprot/%s#%s,
#protein_xref_umls,UMLS,http://example.org/%s,
#protein_xref_tcdb,TCDB,http://www.tcdb.org/search/result.php?tc=%s,
#protein_xref_proteinmodelportal,Protein Model Portal,https://www.proteinmodelportal.org/query/uniprot/%s,
#protein_xref_prints,PRINTS,http://umber.sbs.man.ac.uk/cgi-bin/dbbrowser/sprint/searchprintss.cgi?display_opts=Prints&category=None&queryform=false&prints_accn=%s,
#protein_xref_evolutionarytrace,Evolutionary Trace,http://mammoth.bcm.tmc.edu/cgi-bin/report_maker_ls/uniprotTraceServerResults.pl?identifier=%s,
#protein_xref_signor,Signor,https://signor.uniroma2.it/relation_result.php?id=%s,
#protein_xref_supfam,SUPFAM,http://supfam.org/SUPERFAMILY/cgi-bin/scop.cgi?ipid=%s,
#protein_xref_tigrfams,TIGRFAMs,http://www.jcvi.org/cgi-bin/tigrfams/HmmReportPage.cgi?acc=%s,
#protein_xref_hamap,HAMAP,https://hamap.expasy.org/signature/%s,
#protein_xref_expressionatlas,Expression Atlas,https://www.ebi.ac.uk/gxa/query?geneQuery=%s,
#protein_xref_genetree,GeneTree,http://www.ensemblgenomes.org/id-genetree/%s,
#protein_xref_icd9,ICD-9-CM,http://example.org/%s,
#protein_xref_gene3d,Gene3D,http://www.cathdb.info/superfamily/%s,
#protein_xref_prosite_prorule,PROSITE-PRORULE,https://prosite.expasy.org/rule/%s,
#protein_xref_prosite-prorule,PROSITE-PRORULE,https://prosite.expasy.org/rule/%s,
#protein_xref_prosite,PROSITE,https://prosite.expasy.org/doc/%s,
#protein_xref_modbase,ModBase,http://salilab.org/modbase-cgi/model_search.cgi?searchkw=name&kword=%s,
#protein_xref_ogp,USC-OGP,http://usc_ogp_2ddatabase.cesga.es/cgi-bin/2d/2d.cgi?%s,
#protein_xref_homologene,HomoloGene,https://www.ncbi.nlm.nih.gov/homologene/%s,
#protein_xref_signalink,SignaLink,http://signalink.org/protein/%s,
#protein_xref_prodom,ProDom,http://prodom.prabi.fr/prodom/current/cgi-bin/request.pl?question=SPTR&query=%s,
#protein_xref_genatlas,GenAtlas,http://genatlas.medecine.univ-paris5.fr/fiche.php?symbol=%s,
#protein_xref_genomernai,GenomeRNAi,http://genomernai.org/genedetails/%s,
#protein_xref_sbkb,SBKB,http://sbkb.org/uid/%s/uniprot,
#protein_xref_promex,ProMEX,http://promex.pph.univie.ac.at/promex/?ac=%s,
#protein_xref_proteomicsdb,ProteomicsDB,https://www.proteomicsdb.org/proteomicsdb/#human/proteinDetails/%s,
#protein_xref_eupathdb,EuPathDB,http://www.eupathdb.org/gene/%s,
#protein_xref_inparanoid,InParanoid,http://inparanoid.sbc.su.se/cgi-bin/gene_search.cgi?id=%s,
#protein_xref_cleanex,CleanEX,http://cleanex.vital-it.ch/cgi-bin/get_doc?db=cleanex&format=nice&entry=%s,
#protein_xref_disprot,DisProt,http://www.disprot.org/%s,
#protein_xref_ESP,ESP,https://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=%s,
#protein_xref_dmdm,DMDM,http://bioinf.umbc.edu/dmdm/gene_prot_page.php?search_type=protein&id=%s,
#protein_xref_pir,PIR,http://pir.georgetown.edu/cgi-bin/nbrfget?uid=%s,
#protein_xref_mgi_alliancegenome,MGI,http://www.informatics.jax.org/homology/cluster/key/%s,
#protein_xref_huge,HUGE,http://www.kazusa.or.jp/huge/gfpage/%s,
#protein_xref_swisslipids,SwissLipids,http://www.swisslipids.org/#/entity/%s/,
#protein_xref_hpa,Human Protein Atlas,http://www.proteinatlas.org/tissue_profile.php?antibody_id=%s,
#protein_xref_protonet,ProtoNet,http://www.protonet.cs.huji.ac.il/sp.php?prot=%s,
#protein_xref_mesh,MeSH,http://example.org/%s,
#protein_xref_peptideatlas,Peptide Atlas,https://db.systemsbiology.net/sbeams/cgi/PeptideAtlas/Search?action=GO&search_key=%s,
#protein_xref_panther_gene,PANTHER,http://www.pantherdb.org/genes/gene.do?acc=MOUSE|MGI=MGI=%s|UniProtKB=%s,
#protein_xref_treefam,TreeFam,http://www.treefam.org/family/%s,
#glycan_xref_doi,DOI,https://glygen.org/publication/DOI/%s,
#protein_xref_ko,KEGG Orthology,http://www.genome.jp/dbget-bin/www_bget?ko:%s,
#glycan_xref_unicarbkb_comp,UniCarbKB,http://unicarbkb.org/query,
#protein_xref_gpcrdb,GPCRdb,http://gpcrdb.org/protein/%s/,
#glycan_xref_motif,Glycan Motif,https://glycomotif.glyomics.org/glycomotif/%s,
#glycan_xref_uberon,Glycan UBERON,http://purl.obolibrary.org/obo/%s,
#glycan_xref_cellosaurus,Glycan Cellosaurus,https://web.expasy.org/cellosaurus/%s,
#protein_xref_cell_ontology,Cell Ontology,http://purl.obolibrary.org/obo/CL_%s,
#protein_xref_mesh,Medical Subject Headings,http://id.nlm.nih.gov/mesh/%s,
#protein_xref_ncit,NCI Thesaurus,http://purl.obolibrary.org/obo/NCIT_%s,
#protein_xref_efo,Experimental Factor Ontology,http://www.ebi.ac.uk/efo/EFO_%s,
#protein_xref_brenda,BRENDA Tissue Ontology,http://purl.obolibrary.org/obo/BTO_%s,
#tissue_xref_mesh,Medical Subject Headings,http://id.nlm.nih.gov/mesh/%s,
#tissue_xref_uberon,UBERON,http://purl.obolibrary.org/obo/UBERON_%s,
#tissue_xref_omit,Ontology for MIRNA Target,http://purl.obolibrary.org/obo/OMIT_%s,
#cell_xref_cvcl,Cellosaurus,https://web.expasy.org/cellosaurus/CVCL_%s,
#tissue_xref_cl,Cell Ontology,http://purl.obolibrary.org/obo/CL_%s,
#tissue_xref_bto,BRENDA Tissue Ontology,http://purl.obolibrary.org/obo/BTO_%s,
kmartinez834 commented 9 months ago

@rykahsay I previously commented out the rows that were dead or not used in any datasets.

After checking all the commented rows again, I removed the comment from the following, as they do appear in a dataset, however some just point to a generic homepage. There were also a couple errors that are now corrected:

protein_xref_entrez,Entrez,https://www.ncbi.nlm.nih.gov/gene/%s,7515|3717|5290
protein_xref_hepatitisconline,HepatitisCOnline,https://www.hepatitisc.uw.edu/page/%s/biology,hcv
protein_xref_viruspathogenresource,ViPR,https://www.viprbrc.org/brc/home.spg?decorator=flavi_%s,hcv
protein_xref_prosite_prorule,PROSITE-PRORULE,https://prosite.expasy.org/rule/%s,PRU00498|PRU00076|PRU00415
protein_xref_prosite-prorule,PROSITE-PRORULE,https://prosite.expasy.org/rule/%s,PRU00498|PRU00076|PRU00415
glycan_xref_motif,Glycan Motif,https://glycomotif.glyomics.org/glycomotif/%s,GGM.000001|GGM.000081|GGM.000113
protein_xref_brenda,BRENDA Enzymes,https://www.brenda-enzymes.org/enzyme.php?ecno=%s|5.5.1.4|3.4.11.1|3.4.21.98
rykahsay commented 9 months ago

"not used in any datasets" should not be a reason to comment them out. Please uncomment any of the following if they lead to a valid (not dead) link.

glycan_xref_cfg
glycan_xref_glycan_type
glycan_xref_glycosciencesde
glycan_xref_harvard
glycan_xref_inchi_key
glycan_xref_monosaccharide_residue_name
glycan_xref_unicarbkb
protein_xref_go
protein_xref_unicarbkb
protein_xref_uniprotkb_proteinname
tissue_xref_uberon
kmartinez834 commented 9 months ago

Done.

--> glycan_xref_harvarddoes not exist, did you mean glycan_xref_harvardu? This one was not commented out though.