bigbio / sdrf-pipelines

A repository to convert SDRF proteomics files into pipelines config files
Apache License 2.0
16 stars 21 forks source link

CI is failing, probably due to resource location change #126

Closed fabianegli closed 2 years ago

fabianegli commented 2 years ago

The CI currently fails because some test tries to retrieve

http://purl.obolibrary.org/obo/NCBITaxon_9606&size=100

which fails.

When this URL is entered in a browser the browser is redirected to http://ontologies.berkeleybop.org/

fabianegli commented 2 years ago

Dumping this here:

https://www.ebi.ac.uk/spot/zooma/docs/api

This is the code in the test that runs throuhgh:


In [1]: from sdrf_pipelines.zooma.zooma import Zooma, SlimOlsClient

In [2]: keyword = 'human'

In [3]: client = Zooma()

In [4]: results = client.recommender(keyword, filters="ontologies:[nbcitaxon]")

In [5]: results
Out[5]: 
[{'uri': None,
  'annotatedProperty': {'uri': 'http://rdf.ebi.ac.uk/resource/zooma/69A7F59A19C8525E4C7B04C798596A69',
   'propertyType': 'factor',
   'propertyValue': 'Human'},
  '_links': {'olslinks': [{'href': 'https://ves-oy-be:8080/ols/api/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FNCBITaxon_9606',
     'semanticTag': 'http://purl.obolibrary.org/obo/NCBITaxon_9606'}]},
  'semanticTags': ['http://purl.obolibrary.org/obo/NCBITaxon_9606'],
  'replacedBy': [],
  'replaces': [],
  'derivedFrom': {'uri': 'http://rdf.ebi.ac.uk/resource/zooma/metabolights/8F94CFC1124A59BCEB894A9917CA77C2',
   'annotatedProperty': {'uri': 'http://rdf.ebi.ac.uk/resource/zooma/69A7F59A19C8525E4C7B04C798596A69',
    'propertyType': 'factor',
    'propertyValue': 'Human'},
   '_links': {'olslinks': [{'href': 'http://purl.obolibrary.org/obo/NCBITaxon_9606',
      'semanticTag': 'http://purl.obolibrary.org/obo/NCBITaxon_9606'}]},
   'semanticTags': ['http://purl.obolibrary.org/obo/NCBITaxon_9606'],
   'replacedBy': [],
   'replaces': [],
   'provenance': {'source': {'type': 'DATABASE',
     'name': 'metabolights',
     'uri': 'https://www.ebi.ac.uk/metabolights'},
    'evidence': 'MANUAL_CURATED',
    'accuracy': 'NOT_SPECIFIED',
    'generator': 'https://www.ebi.ac.uk/metabolights',
    'generatedDate': 1649104721000,
    'annotator': 'Zoe May Pendlington',
    'annotationDate': 1538852400000},
   'annotatedBiologicalEntities': [{'uri': 'http://rdf.ebi.ac.uk/resource/zooma/metabolights/1846EDBCECE5A0D6E04AC107EF21F950',
     'name': 'metabo_219',
     'types': ['http://www.w3.org/2002/07/owl#NamedIndividual',
      'http://rdf.ebi.ac.uk/terms/zooma/Target'],
     'studies': [{'uri': 'http://rdf.ebi.ac.uk/resource/zooma/metabolights/A6AA403F9EF597D6BCFAAAFB79571724',
       'accession': 'MTBLS176',
       'types': ['http://www.w3.org/2002/07/owl#NamedIndividual',
        'http://rdf.ebi.ac.uk/terms/zooma/DatabaseEntrySource']}]}]},
  'confidence': 'HIGH',
  'provenance': {'source': {'type': 'DATABASE',
    'name': 'zooma',
    'uri': 'www.ebi.ac.uk/spot/zooma'},
   'evidence': 'ZOOMA_INFERRED_FROM_CURATED',
   'accuracy': None,
   'generator': 'ZOOMA',
   'generatedDate': 1654622176809,
   'annotator': 'ZOOMA',
   'annotationDate': 1654622176809},
  'annotatedBiologicalEntities': []}]

In [6]: ols_terms = client.process_zooma_results(results)

In [7]: ols_terms
Out[7]: 
[{'queryValue': 'Human',
  'confidence': 'HIGH',
  'ols_url': 'https://ves-oy-be:8080/ols/api/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FNCBITaxon_9606'}]

In [8]: ols_client = SlimOlsClient()

this then fails (simplified for interative debugging)


In [9]: ols_client.get_term_from_url(ols_terms[0]['ols_url'], ontology="ncbitaxon")
---------------------------------------------------------------------------
gaierror                                  Traceback (most recent call last)
File ~/github/fabianegli/sdrf-pipelines/venv/lib/python3.8/site-packages/urllib3/connection.py:174, in HTTPConnection._new_conn(self)
    173 try:
--> 174     conn = connection.create_connection(
    175         (self._dns_host, self.port), self.timeout, **extra_kw
    176     )
    178 except SocketTimeout:

File ~/github/fabianegli/sdrf-pipelines/venv/lib/python3.8/site-packages/urllib3/util/connection.py:72, in create_connection(address, timeout, source_address, socket_options)
     68     return six.raise_from(
     69         LocationParseError(u"'%s', label empty or too long" % host), None
     70     )
---> 72 for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
     73     af, socktype, proto, canonname, sa = res

File /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/socket.py:918, in getaddrinfo(host, port, family, type, proto, flags)
    917 addrlist = []
--> 918 for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
    919     af, socktype, proto, canonname, sa = res

gaierror: [Errno 8] nodename nor servname provided, or not known

During handling of the above exception, another exception occurred:

NewConnectionError                        Traceback (most recent call last)
File ~/github/fabianegli/sdrf-pipelines/venv/lib/python3.8/site-packages/urllib3/connectionpool.py:703, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    702 # Make the request on the httplib connection object.
--> 703 httplib_response = self._make_request(
    704     conn,
    705     method,
    706     url,
    707     timeout=timeout_obj,
    708     body=body,
    709     headers=headers,
    710     chunked=chunked,
    711 )
    713 # If we're going to release the connection in ``finally:``, then
    714 # the response doesn't need to know about the connection. Otherwise
    715 # it will also try to release it and we'll have a double-release
    716 # mess.

File ~/github/fabianegli/sdrf-pipelines/venv/lib/python3.8/site-packages/urllib3/connectionpool.py:386, in HTTPConnectionPool._make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
    385 try:
--> 386     self._validate_conn(conn)
    387 except (SocketTimeout, BaseSSLError) as e:
    388     # Py2 raises this as a BaseSSLError, Py3 raises it as socket timeout.

File ~/github/fabianegli/sdrf-pipelines/venv/lib/python3.8/site-packages/urllib3/connectionpool.py:1040, in HTTPSConnectionPool._validate_conn(self, conn)
   1039 if not getattr(conn, "sock", None):  # AppEngine might not have  `.sock`
-> 1040     conn.connect()
   1042 if not conn.is_verified:

File ~/github/fabianegli/sdrf-pipelines/venv/lib/python3.8/site-packages/urllib3/connection.py:358, in HTTPSConnection.connect(self)
    356 def connect(self):
    357     # Add certificate verification
--> 358     self.sock = conn = self._new_conn()
    359     hostname = self.host

File ~/github/fabianegli/sdrf-pipelines/venv/lib/python3.8/site-packages/urllib3/connection.py:186, in HTTPConnection._new_conn(self)
    185 except SocketError as e:
--> 186     raise NewConnectionError(
    187         self, "Failed to establish a new connection: %s" % e
    188     )
    190 return conn

NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x10c74d6a0>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known

During handling of the above exception, another exception occurred:

MaxRetryError                             Traceback (most recent call last)
File ~/github/fabianegli/sdrf-pipelines/venv/lib/python3.8/site-packages/requests/adapters.py:440, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
    439 if not chunked:
--> 440     resp = conn.urlopen(
    441         method=request.method,
    442         url=url,
    443         body=request.body,
    444         headers=request.headers,
    445         redirect=False,
    446         assert_same_host=False,
    447         preload_content=False,
    448         decode_content=False,
    449         retries=self.max_retries,
    450         timeout=timeout
    451     )
    453 # Send the request.
    454 else:

File ~/github/fabianegli/sdrf-pipelines/venv/lib/python3.8/site-packages/urllib3/connectionpool.py:785, in HTTPConnectionPool.urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
    783     e = ProtocolError("Connection aborted.", e)
--> 785 retries = retries.increment(
    786     method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
    787 )
    788 retries.sleep()

File ~/github/fabianegli/sdrf-pipelines/venv/lib/python3.8/site-packages/urllib3/util/retry.py:592, in Retry.increment(self, method, url, response, error, _pool, _stacktrace)
    591 if new_retry.is_exhausted():
--> 592     raise MaxRetryError(_pool, url, error or ResponseError(cause))
    594 log.debug("Incremented Retry for (url='%s'): %r", url, new_retry)

MaxRetryError: HTTPSConnectionPool(host='ves-oy-be', port=8080): Max retries exceeded with url: /ols/api/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FNCBITaxon_9606&size=100 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x10c74d6a0>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'))

During handling of the above exception, another exception occurred:

ConnectionError                           Traceback (most recent call last)
Input In [9], in <cell line: 1>()
----> 1 ols_client.get_term_from_url(ols_terms[0]['ols_url'], ontology="ncbitaxon")

File ~/github/fabianegli/sdrf-pipelines/sdrf_pipelines/zooma/zooma.py:33, in SlimOlsClient.get_term_from_url(url, page_size, ontology)
     25 """
     26 Return a list of terms by ontology
     27 :param url:
   (...)
     30 :return:
     31 """
     32 url += "&" + "size=" + str(page_size)
---> 33 r = requests.get(url)
     34 if r.status_code == 414:
     35     raise HTTPError('URL do not exist in OLS')

File ~/github/fabianegli/sdrf-pipelines/venv/lib/python3.8/site-packages/requests/api.py:75, in get(url, params, **kwargs)
     64 def get(url, params=None, **kwargs):
     65     r"""Sends a GET request.
     66 
     67     :param url: URL for the new :class:`Request` object.
   (...)
     72     :rtype: requests.Response
     73     """
---> 75     return request('get', url, params=params, **kwargs)

File ~/github/fabianegli/sdrf-pipelines/venv/lib/python3.8/site-packages/requests/api.py:61, in request(method, url, **kwargs)
     57 # By using the 'with' statement we are sure the session is closed, thus we
     58 # avoid leaving sockets open which can trigger a ResourceWarning in some
     59 # cases, and look like a memory leak in others.
     60 with sessions.Session() as session:
---> 61     return session.request(method=method, url=url, **kwargs)

File ~/github/fabianegli/sdrf-pipelines/venv/lib/python3.8/site-packages/requests/sessions.py:529, in Session.request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
    524 send_kwargs = {
    525     'timeout': timeout,
    526     'allow_redirects': allow_redirects,
    527 }
    528 send_kwargs.update(settings)
--> 529 resp = self.send(prep, **send_kwargs)
    531 return resp

File ~/github/fabianegli/sdrf-pipelines/venv/lib/python3.8/site-packages/requests/sessions.py:645, in Session.send(self, request, **kwargs)
    642 start = preferred_clock()
    644 # Send the request
--> 645 r = adapter.send(request, **kwargs)
    647 # Total elapsed time of the request (approximately)
    648 elapsed = preferred_clock() - start

File ~/github/fabianegli/sdrf-pipelines/venv/lib/python3.8/site-packages/requests/adapters.py:519, in HTTPAdapter.send(self, request, stream, timeout, verify, cert, proxies)
    515     if isinstance(e.reason, _SSLError):
    516         # This branch is for urllib3 v1.22 and later.
    517         raise SSLError(e, request=request)
--> 519     raise ConnectionError(e, request=request)
    521 except ClosedPoolError as e:
    522     raise ConnectionError(e, request=request)

ConnectionError: HTTPSConnectionPool(host='ves-oy-be', port=8080): Max retries exceeded with url: /ols/api/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FNCBITaxon_9606&size=100 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x10c74d6a0>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'))

from zooma.py

def process_zumma_results(results):

https://github.com/bigbio/sdrf-pipelines/blob/6863c90e12e1a4331d5f12c192f1a7bada7d9403/sdrf_pipelines/zooma/zooma.py#L59-L64

class SlimOlsClient(:

https://github.com/bigbio/sdrf-pipelines/blob/6863c90e12e1a4331d5f12c192f1a7bada7d9403/sdrf_pipelines/zooma/zooma.py#L17-L39

fabianegli commented 2 years ago

I reported the issue to the Zooma project: https://github.com/EBISPOT/zooma/issues/99

fabianegli commented 2 years ago

The issue was solved upstream.