biglocalnews / warn-scraper

Command-line interface for downloading WARN Act notices of qualified plant closings and mass layoffs from state government websites
https://warn-scraper.readthedocs.io
Apache License 2.0
28 stars 10 forks source link

MO windows bug #195

Closed ydoc5212 closed 2 years ago

ydoc5212 commented 3 years ago

CLI invocation

(WARN-WpkWbRLu) C:\Users\Cody-DellXPS\WARN>python -m warn.cli -l DEBUG -s MO
2021-07-16 14:14:30,440 - warn.runner - Creating necessary dirs
2021-07-16 14:14:30,542 - warn.runner - Scraping MO
2021-07-16 14:14:31,022 - warn.scrapers.mo - Page status is 200 for https://jobs.mo.gov/warn2021
2021-07-16 14:14:31,250 - warn.scrapers.mo - Page status is 200 for https://jobs.mo.gov/warn2021
2021-07-16 14:14:31,472 - warn.scrapers.mo - Page status is 200 for https://jobs.mo.gov/warn2021
2021-07-16 14:14:31,737 - warn.scrapers.mo - Page status is 200 for https://jobs.mo.gov/content/2020-missouri-warn-notices
2021-07-16 14:14:32,014 - warn.scrapers.mo - Page status is 200 for https://jobs.mo.gov/content/2020-missouri-warn-notices
2021-07-16 14:14:32,312 - warn.scrapers.mo - Page status is 200 for https://jobs.mo.gov/warn2019
2021-07-16 14:14:32,593 - warn.scrapers.mo - Page status is 200 for https://jobs.mo.gov/warn2019
2021-07-16 14:14:33,417 - warn.scrapers.mo - Page status is 200 for https://jobs.mo.gov/warn2018
2021-07-16 14:14:33,995 - warn.scrapers.mo - Page status is 200 for https://jobs.mo.gov/warn2018
2021-07-16 14:14:34,684 - warn.scrapers.mo - Page status is 200 for https://jobs.mo.gov/warn2017
2021-07-16 14:14:34,997 - warn.scrapers.mo - Page status is 200 for https://jobs.mo.gov/warn2017
2021-07-16 14:14:35,218 - warn.scrapers.mo - Page status is 200 for https://jobs.mo.gov/warn2016
2021-07-16 14:14:35,460 - warn.scrapers.mo - Page status is 200 for https://jobs.mo.gov/warn2016
2021-07-16 14:14:35,844 - warn.scrapers.mo - Page status is 200 for https://jobs.mo.gov/warn2015
2021-07-16 14:14:36,257 - warn.scrapers.mo - Page status is 200 for https://jobs.mo.gov/warn2015
2021-07-16 14:14:36,303 - __main__ - ERROR: MO scraper. See traceback in C:\Users\Cody-DellXPS\.warn-scraper\logs\mo_err.log
2021-07-16 14:14:36,304 - __main__ - 1 scraper(s) failed to run: MO

Stack trace

Traceback (most recent call last):
  File "pandas\_libs\parsers.pyx", line 1119, in pandas._libs.parsers.TextReader._convert_tokens
  File "pandas\_libs\parsers.pyx", line 1244, in pandas._libs.parsers.TextReader._convert_with_dtype
  File "pandas\_libs\parsers.pyx", line 1259, in pandas._libs.parsers.TextReader._string_convert
  File "pandas\_libs\parsers.pyx", line 1450, in pandas._libs.parsers._string_box_utf8
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 10: invalid start byte

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Cody-DellXPS\WARN\warn\cli.py", line 67, in main
    runner.scrape(state)
  File "C:\Users\Cody-DellXPS\WARN\warn\runner.py", line 46, in scrape
    output_csv = state_mod.scrape(self.output_dir, self.working_dir)
  File "C:\Users\Cody-DellXPS\WARN\warn\scrapers\mo.py", line 49, in scrape
    dedupe(output_csv)
  File "C:\Users\Cody-DellXPS\WARN\warn\scrapers\mo.py", line 78, in dedupe
    df = pd.read_csv(output_csv, keep_default_na = False)
  File "C:\Users\Cody-DellXPS\.virtualenvs\WARN-WpkWbRLu\lib\site-packages\pandas\io\parsers.py", line 688, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "C:\Users\Cody-DellXPS\.virtualenvs\WARN-WpkWbRLu\lib\site-packages\pandas\io\parsers.py", line 460, in _read
    data = parser.read(nrows)
  File "C:\Users\Cody-DellXPS\.virtualenvs\WARN-WpkWbRLu\lib\site-packages\pandas\io\parsers.py", line 1198, in read
    ret = self._engine.read(nrows)
  File "C:\Users\Cody-DellXPS\.virtualenvs\WARN-WpkWbRLu\lib\site-packages\pandas\io\parsers.py", line 2157, in read
    data = self._reader.read(nrows)
  File "pandas\_libs\parsers.pyx", line 847, in pandas._libs.parsers.TextReader.read
  File "pandas\_libs\parsers.pyx", line 862, in pandas._libs.parsers.TextReader._read_low_memory
  File "pandas\_libs\parsers.pyx", line 941, in pandas._libs.parsers.TextReader._read_rows
  File "pandas\_libs\parsers.pyx", line 1073, in pandas._libs.parsers.TextReader._convert_column_data
  File "pandas\_libs\parsers.pyx", line 1126, in pandas._libs.parsers.TextReader._convert_tokens
  File "pandas\_libs\parsers.pyx", line 1244, in pandas._libs.parsers.TextReader._convert_with_dtype
  File "pandas\_libs\parsers.pyx", line 1259, in pandas._libs.parsers.TextReader._string_convert
  File "pandas\_libs\parsers.pyx", line 1450, in pandas._libs.parsers._string_box_utf8
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa0 in position 10: invalid start byte
ydoc5212 commented 3 years ago

mo_err.log attached is the aforementioned error log

palewire commented 2 years ago

This scraper now works fine for me. Since this ticket is over six months old, I'm going to close it. If the error is persisting, please speak up. We can always reopen and address then.