blekhmanlab / rxivist

API providing access to papers and authors scraped from biorxiv.org
https://rxivist.org
GNU Affero General Public License v3.0
59 stars 11 forks source link

Be more sensitive to crawling errors #227

Closed rabdill closed 5 years ago

rabdill commented 5 years ago

If a call keeps failing (for checking publication status, for example), don't just keep hammering them:

Refreshing article 31476
Determining publication status for DOI 10.1101/402800.
Error fetching publication data: ('Connection aborted.', OSError(0, 'Error'))
Retrying:
Determining publication status for DOI 10.1101/402800.
Error fetching publication data: ('Connection aborted.', OSError(0, 'Error'))
Giving up on this one for now.

Refreshing article 25634
Determining publication status for DOI 10.1101/105825.
Paper already has publication recorded. Skipping.
Recorded 2 stats for ID 25634

Refreshing article 33182
Determining publication status for DOI 10.1101/425991.
Error fetching publication data: ('Connection aborted.', OSError(0, 'Error'))
Retrying:
Determining publication status for DOI 10.1101/425991.
Error fetching publication data: ('Connection aborted.', OSError(0, 'Error'))
Giving up on this one for now.
rabdill commented 5 years ago

https://github.com/blekhmanlab/rxivist/commit/c25450181a56a37aa7082ac8306c812912125664