EBISPOT / OLS

Ontology Lookup Service from SPOT at EBI
http://www.ebi.ac.uk/ols
Apache License 2.0
97 stars 40 forks source link

REST API does not link to owl file correctly #388

Closed markmcdowall closed 4 years ago

markmcdowall commented 4 years ago

In the REST API the links to the owl files don't always work.

I get the list of ontologies by:

import requests
request_handle = requests.get(url)
json_document = json.loads(request_handle.text)

If I test the headers based on json_document['_embedded']['ontologies']['config']['id']

valid = requests.head(ontology['owl'])

Then the following URLs either return 404, or have various connection issues:

<class 'requests.exceptions.ConnectionError'>: enm (5.0.1): http://purl.enanomapper.net/onto/enanomapper.owl

<class 'requests.exceptions.ConnectionError'>: hcao (2020-05-22): http://ontology.data.humancellatlas.org/ontologies/hcao

nmrcv - nmrcv (1.1.0): http://nmrML.org/nmrCV
<Response [404]>

<class 'requests.exceptions.ConnectionError'>: scdo (2019-06-26): http://scdontology.h3abionet.org/ontology/scdo.owl

afo - afo (REC/2019/05/10): http://purl.allotrope.org/voc/afo/merged-OLS/REC/2019/05/10
<Response [404]>

# This is the root page rather than the one for the actual owl file
edam - edam (17-07-2019): http://edamontology.org

ppo - ppo (2018-10-26): https://raw.githubusercontent.com/PlantPhenoOntology/ppo/master/ppo.owl
<Response [404]>

<class 'requests.exceptions.ConnectionError'>: sdgio (2018-08-10): http://purl.unep.org/sdg/sdgio.owl
teddy - teddy (rel-2014-04-24): http://identifiers.org/teddy/
<Response [400]>

However, if I use the json_document['_embedded']['ontologies']['config']['fileLocation'] , which would give me the EDAM owl file. This raises errors with other vocabs where sometimes it returns links to the EBI's internal file system (/nfs/pandas/ensembl/.../.../PHI.obo)

<class 'requests.exceptions.InvalidSchema'>: dicom (None): ftp://medical.nema.org/MEDICAL/Dicom/Resources/Ontology/DCM/dcm.owl

<class 'requests.exceptions.ConnectionError'>: enm (5.0.1): http://purl.enanomapper.net/onto/enanomapper.owl

<class 'requests.exceptions.ConnectionError'>: afo (REC/2019/05/10): http://afo-ols.semanticsfirst.com/ontologies/afo

genepio - genepio (2018-06-15): https://raw.githubusercontent.com/GenEpiO/genepio/master/src/ontology/genepio-merged-cardfix.owl
<Response [404]>

<class 'requests.exceptions.InvalidSchema'>: phi (10-12-2018): file:/nfs/panda/ensembl/production/ensprod/ontologies/phi/PHI.obo

<class 'requests.exceptions.ConnectionError'>: sdgio (2018-08-10): http://purl.unep.org/sdg/sdgio.owl

Neither using fileLocation or the id seems to guarentee getting the source owl file for all ontologies.

jamesamcl commented 4 years ago

You've encountered one of the main problems we have as OLS maintainers: that the ontology links often become dead. Our own indexer which runs daily also hits many of these dead links, so we have to skip over the ones that fail and these ontologies will not be updated until we update the OLS configuration with the new URLs.

We make a best effort to update the URLs where we can, but with the growing number of ontologies indexed it is a moving target. Therefore, you should do as our indexer does and not assume that all of the URLs in the fileLocation property will resolve correctly.

phi is indeed loaded from a file:// URL on our internal NFS, which is not ideal. I will see if there is a http URL we can use instead.