codeforsanjose / city-agenda-scraper

9 stars 16 forks source link

General PDF text extraction and cleanup #34

Open swotai opened 3 years ago

swotai commented 3 years ago

This ticket is to start working on some organized way to read files from GDrive, PDF text extract, and data cleaning, to be fed to the keyword extractor/summarization pipelines.

notebook has code for keyword extract.

TODOs: