DOAJ / harvester

External library which can harvest content from 3rd parties and add them to the DOAJ via its API
3 stars 0 forks source link

Crashes on EPMC error #2

Open emanuil-tolev opened 8 years ago

emanuil-tolev commented 8 years ago

Should catch exceptions maybe?

Traceback (most recent call last):
  File "/home/cloo/harvester/src/harvester/service/runner.py", line 8, in <module>
    workflow.HarvesterWorkflow.process_account(account_id)
  File "/home/cloo/harvester/src/harvester/service/workflow.py", line 25, in process_account
    HarvesterWorkflow.process_issn(account_id, issn)
  File "/home/cloo/harvester/src/harvester/service/workflow.py", line 71, in process_issn
    for article, lhd in p.iterate(issn, lh):
  File "/home/cloo/harvester/src/harvester/service/models/epmc.py", line 45, in iterate
    for record in client.EuropePMC.complex_search_iterator(query, throttle=throttle):   # also throttle paging requests
  File "/home/cloo/harvester/src/harvester/magnificent-octopus/octopus/modules/epmc/client.py", line 100, in iterate
    results = cls.query(query_string, page=page, page_size=page_size)
  File "/home/cloo/harvester/src/harvester/magnificent-octopus/octopus/modules/epmc/client.py", line 124, in query
    raise EuropePMCException(resp)
octopus.modules.epmc.client.EuropePMCException: <Response [500]>
emanuil-tolev commented 8 years ago
requests.exceptions.ConnectionError: HTTPConnectionPool(host='www.ebi.ac.uk', port=80): Max retries exceeded with url: /europepmc/webservices/rest/search/query=ISSN:%221935-2735%22%20OPEN_ACCESS:%22y%22%20UPDATE_DATE:2006-10-31%20sort_date:%22y%22&resulttype=core&format=json&page=1&pageSize=1000 (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7ff6de042090>: Failed to establish a new connection: [Errno -2] Name or service not known',))

EPMC doesn't seem to exist in this case - obviously a temporary problem. Best to catch requests.exceptions.ConnectionError, sleep for 10s, and retry - up until a few hours of trying. Log each retry.

richard-jones commented 8 years ago

This should already be using the re-try code in octopus, so it evidently was failing for a while.

What we should do is tweak the retry settings for the app - raise the max wait time, and the number of retry attempts so that any service blips are dealt with.

richard-jones commented 8 years ago

You could override any of these in the local.cfg, or I can update them in config/service.py

https://github.com/richard-jones/magnificent-octopus/blob/9322a508e8dcda958eec79c97939843981d96c5c/octopus/config/http.py