Queens-Hacks / qcumber-scraper

Scrapes SOLUS and generates structured data
3 stars 6 forks source link

Python 2 Unicode issues #8

Closed pR0Ps closed 10 years ago

pR0Ps commented 10 years ago

Some characters in the scraped data cause the scraper to break when using Python 2 (v2.7.3 specifically).

The error is a UnicodeEncodeError and can be reliably reproduced by attempting to format a string with the course name of BIOL 951.

Scrape job to reproduce: ScrapeJob(letters="B", deep=False, subject_start=1, subject_end=2, course_start=137)

Details:

Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "./main.py", line 105, in run_jobs
    SolusScraper(session, job).start()
  File "./scraper.py", line 18, in start
    self.scrape_letters()
  File "./scraper.py", line 31, in scrape_letters
    self.scrape_subjects()
  File "./scraper.py", line 51, in scrape_subjects
    self.scrape_courses()
  File "./scraper.py", line 71, in scrape_courses
    logging.info("----Course: {number} - {title}".format(**course_attrs['basic']))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa0' in position 7: ordinal not in range(128)