Closed adamrlinder closed 3 years ago
The following code can get the number of pages available by looking at the number of page links from the New Criminal Filing Website:
# Determine page count
source = requests.get(PAGE_URL, params = {"search": record_date}).text
soup = BeautifulSoup(source)
ul = soup.findAll("ul", {"class": "pagination"})[0]
# Remove last entry since that's just the the link to the next or ">>" button
pages = ul.findAll("li", recursive=False)[:-1]
num_pages = len(pages)
end_page = num_pages
I don't know if it's needed, but it's another option.
Merged in a fix. Closing.
0_parse.py, which scrapes New Criminal Filings and is the entry point of the whole docket downloading workflow, has a hardcoded limit that it scrapes only 3 pages of cases from the website. This means we have missed out on data over the last several months and should be fixed ASAP.
The script needs to be updated to determine how many pages of cases there are and scrape all of them.