crccheck / atx-bandc

Scrape Austin, TX Boards and Commissions into RSS feeds
https://bandc.crccheck.com/
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

fix: Handle PDFTextExtractionNotAllowed pdfs #42

Closed crccheck closed 4 years ago

crccheck commented 4 years ago

These PDFs are extractable, but pdfminer won't extract them https://github.com/pdfminer/pdfminer.six/issues/350

This forks the high level extract_text function to fix this. I could have combined _get_pdf_page_count with but then I wouldn't be able to delete this code in the future if pdfminer implements a fix.

Part of #38