Open nonprofittechy opened 1 year ago
There's some JSON file embedded in this page: https://www.mass.gov/orgs/massachusetts-court-system/locations. Can check it as well, although I think sometimes it differs from what you get when you visit each link manually.
See this for the specific element on the page that might have some data: https://github.com/GBLS/docassemble-MACourts/blob/6a5da00ddefbeec39aa5b1f140921de52d7faf80/docassemble/MACourts/macourts.py#L17
Note: it looks like the structure of this page changed and it's no longer a JSON file embedded with everything. You need to click next.
Here is the list of data we want to collect from scraping the court list:
Note that https://www.mass.gov/guides/courthouses-by-county seems to have the links to all the court pages [on one page]. Since we do actually need to visit the page to get this info, this is probably better than the paginated list linked above.
Someone should do a cursory check of the google spreadsheet I created as a first attempt. I'm not sure what anomalies to look for.
Artifacts:
Problems:
Notes:
Court URL on masscourts.gov, if any
Just noticed this. Do we expect some courts to not have pages? Would that be detectable on https://www.mass.gov/guides/courthouses-by-county or does that only list courts that do have pages? If not, I'm not sure how to include those in the csv.
Yeah, there will be a few courts that map on to an existing court (e.g., a session of the housing court) that we have a separate entry for in our database but never had a dedicated page on mass.gov created for them. In theory we could link to a page that was higher up in the hierarchy in that case.
I wouldn't worry about catching that kind of issue through scraping. All we can do is get the data that's on mass.gov, and assume ours includes more in some cases.
Already noticed some of the numbers are missing extensions. All the ones I've seen, though, are ones that are in the sidebar, not in the main phone number section (at the top of the page). Do those sidebar numbers matter? Example: row 46 linking for https://www.mass.gov/locations/lawrence-district-court.
No, extensions change. The main phone tree should get people to the right place still.
Note that there are a lot of errors on the webpage (or least there were ~ 2 years ago). So this is just to have a way for us to highlight things that look different for manual inspection.
No rush on this--likely a student can work on it this summer (if we have a summer student) or in the Fall.