datamade / chi-councilmatic

:eyes: keep tabs on Chicago city council
https://chicago.councilmatic.org/
MIT License
21 stars 16 forks source link

switch from Legistar to chicityclerkelms.chicago.gov #352

Closed derekeder closed 10 months ago

derekeder commented 1 year ago

The Chicago City Clerk is launching a new legislative management system to replace Legistar: https://chicityclerkelms.chicago.gov/

Some time later today (June 16), the chicago.legistar.com/ will no longer work and we will need to update our scrapers to pull data from the new API. Documentation is here: https://api.chicityclerkelms.chicago.gov/

I've been told the new system will have all the same data as the old one. As of this writing, data is updated through June 2, 2023.

fgregg commented 1 year ago

taking some notes here:

general

  1. string values often have trailing white space

events

  1. in the agenda items, there is no flag that indicates whether an item has votes associated with it, this means i have to check a separate url for every agenda item. the API should offer a flag.
  2. the location field of events is being used to capture the status of events (i.e. cancelled) this should be moved to separate field
  3. the videoLink attribute sometimes has more than one item, this should be an array not a whitespace delimited string.
  4. location is sometimes an empty string. it probably shouldn't be
  5. the transcriptLink is always a empty string. should be dropped.
  6. would be helpful if location was a a controlled vocabulary

people

  1. address, phone, ... are ward office, address2, phone2, ... are city hall office. these should have better names
  2. missing who the chair and vice chairs of committees are!
  3. at most only one website is every listed, can't drop site2

bills

  1. Voice votes are not easy to identify
  2. Results of votes are not easy to identify
  3. In actions, the actionByName is wrong when it's a referral. it's giving the name of the recipeint of the referral not the referrer. should add another field.
  4. Would be good to actually capture motions not just actions
  5. would be good to indicate the nature of the relation for related bills
  6. sometime matter title and record number are missing, probably shouldn't be.
  7. sometimes an actionName in an action is an emptry string,
  8. sometimes the date of an action is missing
  9. matterCategory should be an array not a pipe delimited string
  10. Many matters say they are related to "CL2012-149" but are clearly not
fgregg commented 1 year ago

the body doing the action is often wrong:

https://chicityclerkelms.chicago.gov/Matter/?matterId=6FBDB317-5C09-EE11-8F6D-001DD809B965 https://chicityclerkelms.chicago.gov/Matter/?matterId=359F7376-F509-EE11-8F6D-001DD809B965

(related matters is also broken on that one)

https://chicityclerkelms.chicago.gov/Matter/?matterId=90303FC2-000A-EE11-8F6D-001DD809B578 https://chicityclerkelms.chicago.gov/Matter/?matterId=A5F30658-E30A-EE11-8F6D-001DD809B965

...

stevevance commented 1 year ago

It appears that title field in the Matters is not a filterable field. This is useful to me because I only want to extract matters pertaining to certain categories, i.e. "zoning reclassification" for zoning map amendments (rezonings).

screenshot of API documentation

It would also be nice if matterCategory was a filterable field, since the Clerk appears to use "matterCategory": "ZONING RECLASSIFICATIONS" consistently.


For my purposes, however, using the full-text search and quoting "zoning reclassification" should capture at least this category of matters. (City Council is consistent in using that phrase in the title for rezonings, but City Council is not consistent for a lot of other categories that I try to scrape.)

curl -X GET "https://api.chicityclerkelms.chicago.gov/matter?search=%22zoning%20reclassification%22" -H  "accept: application/json; charset=utf-8"
fgregg commented 1 year ago

around 700 bills have an action where the org name is missing. missing_action_org.txt

fgregg commented 1 year ago

https://api.chicityclerkelms.chicago.gov/matter/recordNumber=

endpoint sometimes need a whitespace character after the record number, and sometimes not.

https://api.chicityclerkelms.chicago.gov/matter/recordNumber/SO2023-0002817

doesn't work but

https://api.chicityclerkelms.chicago.gov/matter/recordNumber/SO2023-0002817%20

does

fgregg commented 1 year ago

the new system has cases where there are multiple distinct entries for the same bill

https://chicityclerkelms.chicago.gov/Matter/?matterId=60659CA3-1F11-EE11-8F6C-001DD8094692 https://chicityclerkelms.chicago.gov/Matter/?matterId=27585215-4305-EE11-8F6D-001DD806EC60

https://chicityclerkelms.chicago.gov/Matter/?matterId=78E9D319-4D14-EE11-8F6D-001DD806F88B https://chicityclerkelms.chicago.gov/Matter/?matterId=D47CEE41-4E14-EE11-8F6D-001DD806F9D9

derekeder commented 10 months ago

@fgregg we've been switched over for a while now. should we close and track data problems in separate issues?

https://notes.chicago.councilmatic.org/blog/using-data-from-new-system.html

fgregg commented 10 months ago

sure!

On Fri, Oct 20, 2023 at 4:57 PM Derek Eder @.***> wrote:

@fgregg https://github.com/fgregg we've been switched over for a while now. should we close and track data problems in separate issues?

https://notes.chicago.councilmatic.org/blog/using-data-from-new-system.html

— Reply to this email directly, view it on GitHub https://github.com/datamade/chi-councilmatic/issues/352#issuecomment-1773385046, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEDC3PCNMIPXFBW2TC4UVDYALQVHAVCNFSM6AAAAAAZJUIIQOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONZTGM4DKMBUGY . You are receiving this because you were mentioned.Message ID: @.***>