CodeForPhilly / pbf-scraping

Project for Philadelphia Bail Fund to scrape new criminal filings from municipal court
https://codeforphilly.github.io/pbf-scraping
10 stars 4 forks source link

Parsing of prelim_hearing_date sometimes getting extra text #67

Closed notchia closed 3 years ago

notchia commented 3 years ago

Occasionally a string like "07/02/2020 07/02/2020" is found instead of just "07/02/2020" for the preliminary hearing date.

Edit: actually, sometimes it also does "07/02/2020 07/03/2020," so maybe it's grabbing the next event after the one we want? Or, depending on the regex used, it's finding multiple matching event entries?

notchia commented 3 years ago

Related enough that it's probably the same issue: the preliminary hearing time regex seems to be grabbing both the date and the time, so the "time" column is currently "date time"