City-Bureau / city-scrapers

Scrape, standardize and share public meetings from local government websites
https://cityscrapers.org
MIT License
335 stars 312 forks source link

Allow for any number of municipalities #164

Closed jacobroufa closed 5 years ago

jacobroufa commented 7 years ago

This is intended to be more of a discussion, hopefully resulting in a number of better-defined issues. An "epic" if you're familiar with Agile methodology.

Per discussions at the first Design Day, one of the teams' goal was to plan out a Small Municipality Toolkit. To that end, I believe some of what we talked about with @diaholliday and @eads was that City Bureau and ProPublica Illinois would be happy to host this infrastructure for municipalities in IL who wish to start their own Documenters programs. I think this gives us an appropriate starting place, in that we can set out to design this infrastructure expansion to be inclusive by state -- ideally other states would find similar organizational sponsors with aligned goals.

Keep in mind that development of the toolkit is not the goal here. However, in order to facilitate organization by municipality within a given state, we need to be saving data in a more appropriate fashion, whether that is to namespace datasets or create new tables per municipality... I have not participated in the development of this aggregator to the extent required, nor have I the language experience with Python to suggest modifications of this degree. I am hoping however that the way in which this is scaled will inform my development of the front-end in the toolkit. (e.g. would I be able to take the for instance and turn it into a URL slug, for a simpler routing scheme?)

So is this enough information in the form of a request from me in order to get started? If not, how can I refine my request? @eads can you take it from here? I'm going to assign you as you've been the primary architect to this point, but please everyone who has thoughts about scaling chime in!

I know there has been some discussion in Slack about this effort recently, but capturing it in an issue seemed better than trusting the scroll to preserve details. So, please re-state your thoughts in this issue! :)

jim commented 7 years ago

I think a first step would be to add a municipality field to the event data. We can actually infer this from the meeting address most of the time, but it seems like it would be more fool-proof to have another field that has this value in it. This would make it simple to filter by municipality for various output purposes, including potentially pushing the Rockford events to a different Airtable account.

I think that we're willing to host all of the spiders on the same machine for the near future in order to simplify development, support, etc. As the number increases, we might have to reevaluate this, but let's keep things simple for now.

pjsier commented 5 years ago

The https://github.com/City-Bureau/city-scrapers-template repo as well as namespacing the scraper names is an attempt to do this. This hasn't been active in a while, so I'll close it unless this seems insufficient