Queens-Hacks / qcumber-scraper

Scrapes SOLUS and generates structured data
3 stars 6 forks source link

Max retries exceeded #14

Closed Graham42 closed 10 years ago

Graham42 commented 10 years ago

I'm not sure if this is just my connection, but i'm trying to run a scrape and i've got this error from a couple threads.

Edit: So all threads died except one. Full stderr log here http://pastebin.com/zfnTkRqZ

Process Process-6:
Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "main.py", line 106, in run_jobs
    SolusScraper(session, job).start()
  File "/vagrant/qcumber-scraper/scraper.py", line 19, in start
    self.scrape_letters()
  File "/vagrant/qcumber-scraper/scraper.py", line 32, in scrape_letters
    self.scrape_subjects()
  File "/vagrant/qcumber-scraper/scraper.py", line 54, in scrape_subjects
    self.scrape_courses(subject)
  File "/vagrant/qcumber-scraper/scraper.py", line 83, in scrape_courses
    self.session.return_from_course()
  File "/vagrant/qcumber-scraper/navigation.py", line 176, in return_from_course
    self._catalog_post('DERIVED_SAA_CRS_RETURN_PB')
  File "/vagrant/qcumber-scraper/navigation.py", line 255, in _catalog_post
    self._post(self.course_catalog_url, data=extras)
  File "/vagrant/qcumber-scraper/navigation.py", line 241, in _post
    self.latest_response = self.session.post(url, **kwargs)
  File "/home/vagrant/.virtualenvs/scraper/local/lib/python2.7/site-packages/requests/sessions.py", line 403, in post
    return self.request('POST', url, data=data, **kwargs)
  File "/home/vagrant/.virtualenvs/scraper/local/lib/python2.7/site-packages/requests/sessions.py", line 361, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/vagrant/.virtualenvs/scraper/local/lib/python2.7/site-packages/requests/sessions.py", line 464, in send
    r = adapter.send(request, **kwargs)
  File "/home/vagrant/.virtualenvs/scraper/local/lib/python2.7/site-packages/requests/adapters.py", line 356, in send
    raise ConnectionError(e)
ConnectionError: HTTPSConnectionPool(host='saself.ps.queensu.ca', port=443): Max retries exceeded with url: /psc/saself/EMPLOYEE/HRMS/c/SA_LEARNER_SERVICES.SSS_BROWSE_CATLG_P.GBL (Caused by <class 'socket.gaierror'>: [Errno -2] Name or service not known)
mystor commented 10 years ago

In my experience I have gotten that error when solus is overloaded. I find scraping overnight to be much more reliable.

I'm not sure exactly what to suggest other then waiting and trying another time. On Jul 5, 2014 4:49 PM, "Graham McGregor" notifications@github.com wrote:

I'm not sure if this is just my connection, but i'm trying to run a scrape and i've got this error from a couple threads

Process Process-6: Traceback (most recent call last): File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap self.run() File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run self._target(_self._args, _self._kwargs) File "main.py", line 106, in run_jobs SolusScraper(session, job).start() File "/vagrant/qcumber-scraper/scraper.py", line 19, in start self.scrape_letters() File "/vagrant/qcumber-scraper/scraper.py", line 32, in scrape_letters self.scrape_subjects() File "/vagrant/qcumber-scraper/scraper.py", line 54, in scrape_subjects self.scrape_courses(subject) File "/vagrant/qcumber-scraper/scraper.py", line 83, in scrape_courses self.session.return_from_course() File "/vagrant/qcumber-scraper/navigation.py", line 176, in return_from_course self._catalog_post('DERIVED_SAA_CRS_RETURN_PB') File "/vagrant/qcumber-scraper/navigation.py", line 255, in _catalog_post self._post(self.course_catalog_url, data=extras) File "/vagrant/qcumber-scraper/navigation.py", line 241, in _post self.latest_response = self.session.post(url, _kwargs) File "/home/vagrant/.virtualenvs/scraper/local/lib/python2.7/site-packages/requests/sessions.py", line 403, in post return self.request('POST', url, data=data, _kwargs) File "/home/vagrant/.virtualenvs/scraper/local/lib/python2.7/site-packages/requests/sessions.py", line 361, in request resp = self.send(prep, _send_kwargs) File "/home/vagrant/.virtualenvs/scraper/local/lib/python2.7/site-packages/requests/sessions.py", line 464, in send r = adapter.send(request, *_kwargs) File "/home/vagrant/.virtualenvs/scraper/local/lib/python2.7/site-packages/requests/adapters.py", line 356, in send raise ConnectionError(e) ConnectionError: HTTPSConnectionPool(host='saself.ps.queensu.ca', port=443): Max retries exceeded with url: /psc/saself/EMPLOYEE/HRMS/c/SA_LEARNER_SERVICES.SSS_BROWSE_CATLG_P.GBL (Caused by <class 'socket.gaierror'>: [Errno -2] Name or service not known)

— Reply to this email directly or view it on GitHub https://github.com/Queens-Hacks/qcumber-scraper/issues/14.

Graham42 commented 10 years ago

That makes sense, closing