PDF text extraction - Improving accuracy for gazette documents

Text extraction from the pdf's is not always 100% accurate because the gazette documents always have 2 columns of text and when they're too close to eachother sentences or words can be mixed up with the words from the other column.

For that reason I have created unit tests in pdf_parsers_tests.py to test the accuracy of the extracted names from signatures in an effort to have 100% accurate data extraction for signatures.

arisp8 / gazette-analysis

PDF text extraction - Improving accuracy for gazette documents #11