codeforsanjose / city-agenda-scraper

9 stars 16 forks source link

Further build out data analysis steps #37

Open xconnieex opened 3 years ago

xconnieex commented 3 years ago

Currently doing tf-idf.

I have previous code in the Text-analysis folder on Github as well as some code based on Anju's colab code that does some text summarization and text modeling, but needs refinement. A dependency is how we read/cleanup the initial text from the PDF.

swotai commented 3 years ago

If we try to clarify what we are aiming to do (Anju's term: what's the "ask"):

Given a PDF file of memorandum/addendum/decision, We want to summarize into the following (think of this as the additional data columns that we can add to Legistar table for each agenda item #)