Data is now in PDFs - Githubissues

gadenbuie / covid19-florida

Florida COVID19 Data parsed from Florida DOH Dashboard and PDF reports

https://covid19-florida.garrickadenbuie.com/

31 stars 8 forks source link

Closed gadenbuie closed 4 years ago

gadenbuie commented 4 years ago

So now they're reporting data in PDFs with unique URLs. I added a step in the scraper that will archive the PDFs -- they have formats like this

https://floridadisaster.org/globalassets/covid19/covid-19-data---daily-report-2020-03-19-0954.pdf

that make it hard to know the URL without scraping the site where the link is placed.

I'm looking into pulling the tables out of the PDF, but ugh.

gadenbuie commented 4 years ago

PDF table extraction is now working. It generally requires some manual review, but the process is now happening with the automatic updates.

If anyone notices errors in the tables extracted from the PDF files, please open a new issue. Thanks!