GBLS / docassemble-MACourts

List Massachusetts Courts in Docassemble
MIT License
5 stars 11 forks source link

Have a script we can use to do a side-by-side comparison of our JSON court lists with the newest info from masscourts #61

Open nonprofittechy opened 1 year ago

nonprofittechy commented 1 year ago

Note that there are a lot of errors on the webpage (or least there were ~ 2 years ago). So this is just to have a way for us to highlight things that look different for manual inspection.

No rush on this--likely a student can work on it this summer (if we have a summer student) or in the Fall.

nonprofittechy commented 10 months ago

There's some JSON file embedded in this page: https://www.mass.gov/orgs/massachusetts-court-system/locations. Can check it as well, although I think sometimes it differs from what you get when you visit each link manually.

See this for the specific element on the page that might have some data: https://github.com/GBLS/docassemble-MACourts/blob/6a5da00ddefbeec39aa5b1f140921de52d7faf80/docassemble/MACourts/macourts.py#L17

Note: it looks like the structure of this page changed and it's no longer a JSON file embedded with everything. You need to click next.

nonprofittechy commented 6 months ago

Here is the list of data we want to collect from scraping the court list:

  1. Court URL on masscourts.gov, if any
  2. Name
  3. Any special notes. E.g.: https://www.mass.gov/locations/southeast-housing-court-barnstable-session says not to mail or file anything at this session. May overlap with the description below.
  4. Description (with all description text from MassCourts. Usually this is a list of locations covered by the court)
  5. fax
  6. phone number of the clerk's office
  7. If there are multiple phone numbers, we could grab those, but usually there's only one clerk's office number
  8. Mailing Address
  9. Physical address
  10. ADA coordinators
  11. Operating hours
plocket commented 6 months ago

Note that https://www.mass.gov/guides/courthouses-by-county seems to have the links to all the court pages [on one page]. Since we do actually need to visit the page to get this info, this is probably better than the paginated list linked above.

plocket commented 6 months ago

Someone should do a cursory check of the google spreadsheet I created as a first attempt. I'm not sure what anomalies to look for.

Artifacts:

Problems:

Notes:

plocket commented 6 months ago

Court URL on masscourts.gov, if any

Just noticed this. Do we expect some courts to not have pages? Would that be detectable on https://www.mass.gov/guides/courthouses-by-county or does that only list courts that do have pages? If not, I'm not sure how to include those in the csv.

nonprofittechy commented 6 months ago

Yeah, there will be a few courts that map on to an existing court (e.g., a session of the housing court) that we have a separate entry for in our database but never had a dedicated page on mass.gov created for them. In theory we could link to a page that was higher up in the hierarchy in that case.

I wouldn't worry about catching that kind of issue through scraping. All we can do is get the data that's on mass.gov, and assume ours includes more in some cases.

plocket commented 6 months ago

Already noticed some of the numbers are missing extensions. All the ones I've seen, though, are ones that are in the sidebar, not in the main phone number section (at the top of the page). Do those sidebar numbers matter? Example: row 46 linking for https://www.mass.gov/locations/lawrence-district-court.

nonprofittechy commented 6 months ago

No, extensions change. The main phone tree should get people to the right place still.