draeger-lab / ModelPolisher

ModelPolisher accesses the BiGG Models knowledgebase to annotate SBML models.
MIT License
23 stars 7 forks source link

Scraping for invalid identifiers.org URLs #126

Open Schmoho opened 1 month ago

Schmoho commented 1 month ago

I just went through Biomodels, BiGG and some models of our group to find identifiers.org URLs that are not using valid namespaces.

Labeled as "feature" because this aims to repair broken data, not annotate and curate valid data.

Here is the list of those where it is obvious how to repair them:

 ["reactome.compound"  "http://identifiers.org/reactome.compound/8851513"]
 ["reactome.reaction"  "http://identifiers.org/reactome.reaction/R-SCE-70523"]
 ["biomodels.sbo" "http://identifiers.org/biomodels.sbo/SBO:0000281"]
 ["inchi_key" "http://identifiers.org/inchi_key/BGWGXPAPYGQALX-OEXCPVAWSA-L"]
 ["ncbigi" "http://identifiers.org/ncbigi/16129989"]
 ["obo.bto" "http://identifiers.org/obo.bto/BTO:0000575"]
 ["obo.chebi" "http://identifiers.org/obo.chebi/CHEBI:33704"]
 ["obo.fma" "http://identifiers.org/obo.fma/FMA:74531"]
 ["obo.go" "http://identifiers.org/obo.go/GO:0004674"]
 ["obo.pato" "http://identifiers.org/obo.pato/PATO:0001021"]
 ["obo.psi-mod" "http://identifiers.org/obo.psi-mod/MOD:00890"]
 ["obo.pw" "http://identifiers.org/obo.pw/PW:0000565"]
 ["psi-mod" "http://identifiers.org/psi-mod/MOD:00000"]
 ["psimod" "http://identifiers.org/psimod/MOD:00048"]
 ["seed.reactions" "https://identifiers.org/seed.reactions:rxn01396"]

For those I have no clue:

 ["bind" "http://identifiers.org/bind/50058"]
 ["EnsemblGenomes-Gn"  "http://identifiers.org/EnsemblGenomes-Gn/ECUMN_4058"]
 ["EnsemblGenomes-Tr"  "http://identifiers.org/EnsemblGenomes-Tr/CAP75598"]
 ["omim" "http://identifiers.org/omim/601417"]
 ["PSEUDO" "http://identifiers.org/PSEUDO/CAB14182.3"]
 ["psimi" "http://identifiers.org/psimi/MI:0501"]
 ["refseq_locus_tag"  "http://identifiers.org/refseq_locus_tag/EC55989_0350"]
 ["refseq_name" "http://identifiers.org/refseq_name/cls-1"]
 ["refseq_old_locus_tag"  "http://identifiers.org/refseq_old_locus_tag/S_0338"]
 ["refseq_orf_id" "http://identifiers.org/refseq_orf_id/sll1027"]
 ["refseq_synonym" "http://identifiers.org/refseq_synonym/CTE-II"]
 ["sabiork" "http://identifiers.org/sabiork/9859"]
 ["unit" "https://identifiers.org/unit/UO:0000040"]
 ["unknown" "http://identifiers.org/unknown/10.1074/jbc.C000664200"]