codeforsanjose / city-agenda-scraper

9 stars 16 forks source link

Pull PDFs of city agendas and staff reports #5

Closed xconnieex closed 3 years ago

xconnieex commented 3 years ago

We've been partnering with Stanford's Big Local News group to design the agenda scrapers on their Github: https://github.com/biglocalnews/civic-scraper. However, we need to do a little tweaking of their code to use it to pull in staff reports in addition to agendas. These staff reports will form the basis of what we want to analyze via our NLP tool.

This Issue will require forking their repo for CivicPlus scraping and using it to also pull in staff reports. We need a large pool of data to train our NLP script, so it would be good to go broad with this. For now, any scraped documents can be added to our Google Drive until we find a better depot.