EBISPOT / ols4

Version 4 of the EMBL-EBI Ontology Lookup Service (OLS)
http://www.ebi.ac.uk/ols4/
Apache License 2.0
35 stars 15 forks source link

Failing ontologies on data load #453

Open serjoshua opened 11 months ago

serjoshua commented 11 months ago

OBO

Fixed ID Status Download Parsing EBI Override PURL Notes Pipeline Error
cido active http://purl.obolibrary.org/obo/cido.owl failed import http://purl.obolibrary.org/obo/DrugsNoChEBI_interactions_with_targets.owl [line: 2, col: 207] {E202} Expecting XML start or end element(s). String data "aeo.obo" not allowed. Maybe there should be an rdf:parseType='Literal' for embedding mixed XML content in RDF. Maybe a striping error.
cto active http://purl.obolibrary.org/obo/cto.owl [line: 25, col: 32] {E201} Multiple children of property element
cvdo active http://purl.obolibrary.org/obo/cvdo.owl failed import http://purl.obolibrary.org/obo/cvdo/external/doid_import.owl [line: 2, col: 207] {E202} Expecting XML start or end element(s). String data "aeo.obo" not allowed. Maybe there should be an rdf:parseType='Literal' for embedding mixed XML content in RDF. Maybe a striping error.
upheno active http://purl.obolibrary.org/obo/upheno.owl failed import http://purl.obolibrary.org/obo/upheno/upheno_root_alignments.owl [line: 1, col: 1 ] Content is not allowed in prolog.
mamo orphaned http://purl.obolibrary.org/obo/mamo.owl [line: 17, col: 74] {E201} Multiple children of property element
vario orphaned http://purl.obolibrary.org/obo/vario.owl [line: 1, col: 1 ] Content is not allowed in prolog.
olatdv inactive http://purl.obolibrary.org/obo/olatdv.owl Not found
pdumdv inactive http://purl.obolibrary.org/obo/pdumdv.owl Not found
rnao inactive http://purl.obolibrary.org/obo/rnao.owl failed import http://www.obofoundry.org/ro/ro.owl Not found
dinto inactive http://purl.obolibrary.org/obo/dinto.owl [line: 13, col: 4 ] {E201} The attributes on this property element, are not permitted with any content; expecting end element tag.
eo inactive http://purl.obolibrary.org/obo/eo.owl Not found
epo inactive http://purl.obolibrary.org/obo/epo.owl Not found
ero inactive http://purl.obolibrary.org/obo/ero.owl [line: 4, col: 27] {E202} Expecting XML start or end element(s). String data "redirecting" not allowed. Maybe there should be an rdf:parseType='Literal' for embedding mixed XML content in RDF. Maybe a striping error.
flu inactive http://purl.obolibrary.org/obo/flu.owl failed import http://purl.obolibrary.org/obo/ido/2010-12-02/ido-main-workaround.owl Not found
mfo inactive http://purl.obolibrary.org/obo/mfo.owl [line: 1, col: 3 ] The markup in the document preceding the root element must be well-formed.
mirnao inactive http://purl.obolibrary.org/obo/mirnao.owl Not found
mo inactive http://purl.obolibrary.org/obo/mo.owl [line: 2, col: 207] {E202} Expecting XML start or end element(s). String data "aeo.obo" not allowed. Maybe there should be an rdf:parseType='Literal' for embedding mixed XML content in RDF. Maybe a striping error.
nmr inactive http://purl.obolibrary.org/obo/nmr.owl [line: 2, col: 207] {E202} Expecting XML start or end element(s). String data "aeo.obo" not allowed. Maybe there should be an rdf:parseType='Literal' for embedding mixed XML content in RDF. Maybe a striping error.
ogi inactive http://purl.obolibrary.org/obo/ogi.owl Not found
sep inactive http://purl.obolibrary.org/obo/sep.owl redirects to http://ontologies.berkeleybop.org/sep.owl Cannot read field "properties" because "this.ontologyNode" is null
vhog inactive http://purl.obolibrary.org/obo/vhog.owl redirects to file points to http://ontologies.berkeleybop.org/vhog.owl Cannot read field "properties" because "this.ontologyNode" is null

EBI OLS Ontologies

Fixed ID Download Parsing PURL Notes Pipeline Error
phi file:/nfs/panda/ensembl/production/ensprod/ontologies/phi/PHI.obo Not found
atol http://www.atol-ontology.com/public/telechargement/atol.owl javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
eol http://www.atol-ontology.com/public/telechargement/eol.owl javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
lbo http://data.bioontology.org/ontologies/LBO/download?apikey=8b5b7825-538d-40e0-9e9e-5ab9274a9aeb Failed to determine the content type: (URI=http://data.bioontology.org/ontologies/LBO/download?apikey=8b5b7825-538d-40e0-9e9e-5ab9274a9aeb : stream=application/octet-stream)
pride https://raw.githubusercontent.com/PRIDE-Utilities/pride-ontology/master/pride_cv.obo migrate away from OBO Failed to determine the content type: (URI=https://raw.githubusercontent.com/PRIDE-Utilities/pride-ontology/master/pride_cv.obo : stream=text/plain)
unimod http://www.unimod.org/obo/unimod.obo migrate away from OBO Failed to determine the content type: (URI=http://www.unimod.org/obo/unimod.obo : stream=null)
hpath https://raw.githubusercontent.com/Novartis/hpath/master/src/hpath.obo migrate away from OBO Failed to determine the content type: (URI=https://raw.githubusercontent.com/Novartis/hpath/master/src/hpath.obo : stream=text/plain)
vido https://raw.githubusercontent.com/infectious-disease-ontology-extensions/ido-virus/master/ontology/vido.owl [line: 23, col: 18] {E201} Multiple children of property element

Original spreadsheet

serjoshua commented 11 months ago

(1) The OLS4 dataloader is an RDF tool and therefore only supports loading RDF files. This means that other non-RDF OWL serialisations such as OBO format and OWL XML are never going to be supported (though of course they can be converted prior to loading). For these (very few) cases we can either ask the upstream ontology vendors to provide an RDF/XML file, or possibly outsource conversion to Robot.

(2) Though we support all the different RDF serializations, the majority of the ontologies are provided without any content-type or any useful file extension to indicate which serialization format they contain. For example, this ontology from the OLS config is Turtle, but the file extension is owl and the content-type is text/plain. No suggestion of Turtle encoding anywhere.

Even the OBO foundry ontologies do this. If we resolve for example http://purl.obolibrary.org/obo/ro.owl it redirects to https://raw.githubusercontent.com/oborel/obo-relations/master/ro.owl. File extension: .owl, content-type is text/plain. While the file content is RDF/XML, there is nothing to suggest that it isn't, for example, OWL XML, or Turtle, or JSON-LD. We only know how to load it in OLS4 because RDF/XML is the hardcoded default.

Why does this work in Protegé and OLS3? Because OWLAPI literally bruteforce loads ontology files by trying every loader until it finds one which works.

While we could probably do something similar in OLS4, I think ultimately it is up to the ontology developers to provide correct metadata, if not by content-type then at least by file extension. The whole .owl thing is a mess. If it's RDF/XML it should be .xml and if it's Turtle it should be .ttl. OR if it really wants to be .owl it should be served up with a content-type.

So TL;DR I think we should continue to default to trying to read .owl files as whatever content-type is provided and falling back on RDF/XML.

Original comment

matentzn commented 11 months ago

I would recommend for the OBO ones:

  1. Ignore all failing "inactive" or "orphaned" ontologies from obo, don't try to fix them. Just record that they don't parse, that's it. Only do anything about them if someone asks. The less ontologies there are, the better.
  2. Make issues on the issue trackers for the active OBO ontologies to get their act together (linking to this issue), then stop trying to fix them (30 minutes of work for all of them).
linikujp commented 1 month ago

@serjoshua What do you recommend the failed ontologies that are .OWL to do? Resave things as .rdf file?

jamesamcl commented 1 month ago

Yes or use robot to convert them on the command line http://robot.obolibrary.org/