Closed alteredbritt closed 3 years ago
Here's a script to download a docket, based on https://github.com/CLSPhila/RecordLib/blob/87dcc657e39b5a95a7d62cf168e9d0d96c7c26a2/scripts/download_dockets.py#L36
# install dependencies
pip install git+https://github.com/CLSPhila/django-docketsearch.git
pip install django requests lxml aiohttp
import requests
from ujs_search.services import searchujs
r_search = searchujs.search_by_name("Kathleen", "Kane", court = "CP")
r_link = resp = searchujs.search_by_docket("CP-46-CR-0006239-2015")
url = r_link[0]["docket_sheet_url"]
r_pdf = requests.get(url, headers={"User-Agent": "ParsingThing"})
with open('example_docket.pdf', 'wb') as f:
f.write(r_pdf.content)
I think this is closable now that Hruday has a method for doing this
Need: to scrape the dockets from the PA Court docket search site
Requirements: — scraping script (preferably Python) to download the PDFs — linking an input of Docket # from the New Criminal Filings scraping script to get the PDFs — generates daily
Site link: https://ujsportal.pacourts.us/DocketSheets/MC.aspx
Discussion Notes: — using a headless browser to mimmock a human clicking through — Code for Philly teams doing similar work: PLSE Expungement record parsing — will eventually be stored in data lake for analysis — will update with more soon as I review my notes again!