CodeForPhilly / pbf-scraping

Project for Philadelphia Bail Fund to scrape new criminal filings from municipal court
https://codeforphilly.github.io/pbf-scraping
10 stars 4 forks source link

bail_set_by is wrong when "ENTRIES" section spans multiple pages #76

Closed adamrlinder closed 3 years ago

adamrlinder commented 3 years ago

While reviewing the data Malik noticed that there are a strangely high number of unique entries for bail_set_by, meaning that one "magistrate" set bail for only one case. In looking at these cases more closely, I've determined that in cases where the ENTIRES section spans two pages, the docket parsing script is picking the defendant in the case as the magistrate, rather than the magistrate who actually set the bail, reflected before the page break.

I've attached a handful of sample dockets below.

MC-51-CR-0024837-2020.pdf MC-51-CR-0025035-2020.pdf MC-51-CR-0025031-2020.pdf