biglocalnews / court-scraper

Scrapers for U.S. county court sites.
ISC License
57 stars 18 forks source link

OdysseySite generating KeyErrors when returning results from site.search #90

Closed justinmayo closed 3 years ago

justinmayo commented 3 years ago

When using OdysseySite to build out my case index files, I occasionally get KeyErrors when parsing the results for some cases. I have run across 15 of these cases so far in Napa, CA and DeKalb, GA. What they all have in common is the "Style / Defendant" column is blank on the initial search results page. When this happens, all the other data points on the page are off by one. For example, the File Date will have the value that was in the status column. That's why there are KeyErrors, because the script is trying to apply a date conversions on the string "Closed" (which was the case status). An example case in Dekalb to check out is 19D93839. Below is the traceback message that shows an example of the date conversion failing because of the missing Style / Defendant data...

stanford@stanford-dj-vbox:~/code/court-scraper-etl$ python odyssey_indexer_errors.py ga_dekalb 2019 Traceback (most recent call last): File "odyssey_indexer_errors.py", line 167, in main(place_id, year) File "odyssey_indexer_errors.py", line 68, in main file_date = datetime.strptime(date, "%m/%d/%Y").strftime("%Y-%m-%d") File "/home/stanford/.pyenv/versions/3.7.6/lib/python3.7/_strptime.py", line 577, in _strptime_datetime tt, fraction, gmtoff_fraction = _strptime(data_string, format) File "/home/stanford/.pyenv/versions/3.7.6/lib/python3.7/_strptime.py", line 359, in _strptime (data_string, format)) ValueError: time data 'Administratively Closed' does not match format '%m/%d/%Y'

zstumgoren commented 3 years ago

@justinmayo Bug fix is applied. You'll need to update the court-scraper library to pull down the changes (easiest path is probably to uninstall and re-install using pip).

justinmayo commented 3 years ago

@zstumgoren Thanks. I will re-install and test it out... But what are you doing working on your week off?!

zstumgoren commented 3 years ago

@justinmayo Heh. Mornings are quiet this week, at least until the tiny humans wake up :) Btw, the bugfix is crafted pretty tightly around the Dekalb issue based on the case ID you mentioned. If it doesn't work in Napa, let me know a few case numbers to test against there and I can take a pass at a more general solution.

justinmayo commented 3 years ago

@zstumgoren Works in both Dekalb and Napa. Thanks!