Sinar / go-pardocs

Tools to process published Parliament documents (PDFs only) into more accessible form. Spiritual successor of https://github.com/leowmjw/parliamentMY-QA-blast
GNU Affero General Public License v3.0
2 stars 0 forks source link

Plan for latest Bukan Lisan fails with GetPlainText ERROR: %!w(string=) #51

Closed kaerumy closed 3 years ago

kaerumy commented 3 years ago

Running plan -BL

**WILL IGNORE!!!! *****
interp   dup
2020/12/23 14:13:56  GetPlainText ERROR: %!w(string=)

split.yml is not written out

Source file: https://parlimen.gov.my/jawapan-bukan-lisan-dr.html?uweb=dr&

kaerumy commented 3 years ago

Corrupted PDF file. Always best to try fix downloaded PDF file for structural problems through use of utilities like pdftk or ghostcript.

Example of using pdfcairo to fix:

pdftocairo -pdf original.pdf fixed.pdf