codeforamerica / pdfhook

A Python web application for converting PDF forms into PDF-filling APIs
https://pdfhook.herokuapp.com
MIT License
46 stars 24 forks source link

WIP: More tests for pdftk functionality #42

Closed bengolder closed 8 years ago

bengolder commented 8 years ago

Changes:

bengolder commented 8 years ago

I've just added a simpler format for expressing the fields and their information. While you can still get the more detailed field information using PDFTKWrapper.get_full_form_field_data(file_path).

[{'name': 'Check Box1',
  'options': ['Off', 'Yes'],
  'type': 'button',
  'value': 'Yes'},
 {'name': 'Check Box2',
  'options': ['Off', 'Yes'],
  'type': 'button',
  'value': 'Yes'},
 {'name': 'Check Box3',
  'options': ['Off', 'Yes'],
  'type': 'button',
  'value': 'Yes'},
 {'name': 'Check Box4',
  'options': ['Jalapeño', 'Off'],
  'type': 'button',
  'value': 'Jalapeño'},
 {'name': 'Check Box5',
  'options': ['Off', 'Yes'],
  'type': 'button',
  'value': 'Yes'},
 {'name': 'Check Box6',
  'options': ['Off', 'Yes'],
  'type': 'button',
  'value': 'Yes'},
 {'name': 'Dropdown8', 'type': 'choice', 'value': 'unless you are a cheese'},
 {'name': 'Dropdown9',
  'options': ['apple',
              'apricot',
              'banana',
              'cranberry',
              'date',
              'fig',
              'grape',
              'lime',
              'mango',
              'orange',
              'peach',
              'raspberry',
              'tamarind'],
  'type': 'choice',
  'value': 'fig'},
 {'name': 'Group7',
  'options': ['Choice1', 'Choice2', 'Choice3', 'Off'],
  'type': 'button',
  'value': 'Choice1'},
 {'name': 'Group8',
  'options': ['Choice4', 'Choice5', 'Off'],
  'type': 'button',
  'value': 'Choice4'},
 {'name': 'Group9',
  'options': ['0', '1', '2', 'Choice1', 'Choice2', 'Off'],
  'type': 'button',
  'value': '1'},
 {'name': 'List Box10',
  'options': ['Bruces',
              'Court Scene – Multiple Murderer',
              'Musical Mice',
              'Scott of the Antarctic',
              'Scott of the Sahara',
              'The Battle of Pearl Harbor',
              'The Olympic Hide and Seek Final',
              'The Visitors',
              'Buying an Ant'],
  'type': 'choice',
  'value': 'Buying an Ant'},
 {'name': 'List Box11', 'type': 'choice', 'value': 'lawn'},
 {'name': 'Multi line text',
  'type': 'text',
  'value': 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec a '
           'diam lectus. Sed sit amet ipsum mauris. Maecenas congue ligula ac '
           'quam viverra nec consectetur ante hendrerit. Donec et mollis '
           'dolor.'},
 {'name': 'MötleyCrüe', 'type': 'text', 'value': 'Just another text field'},
 {'name': 'Text field with spaces in name',
  'type': 'text',
  'value': 'What is going on here???'},
 {'name': 'Text12', 'type': 'text', 'value': '¡Ojalá!'}]
bengolder commented 8 years ago

pdftk seems to be behaving differently based on the environment and version. The tests are passing locally on my computer. This will take some further debugging

bengolder commented 8 years ago

With list boxes and dropdowns, I am getting the following errors on my local machine

======================================================================
ERROR: test_fill_dropdown (tests.integration.test_pdftk.TestFields)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/bgolder/projects/pdf/pdfhook/tests/integration/test_pdftk.py", line 111, in test_fill_dropdown
    filled_pdf = pdftk.fill_pdf(path, sample_answers)
  File "/Users/bgolder/projects/pdf/pdfhook/src/pdftk_wrapper.py", line 255, in fill_pdf
    pdf_path, patched_fdf_str)
  File "/Users/bgolder/projects/pdf/pdfhook/src/pdftk_wrapper.py", line 245, in _load_patched_fdf_into_pdf
    'output', tmp_pdf_path
  File "/Users/bgolder/projects/pdf/pdfhook/src/pdftk_wrapper.py", line 83, in run_command
    raise PdftkError(err.decode('utf-8'))
src.pdftk_wrapper.PdftkError: Unhandled Java Exception in create_output():
java.lang.ArrayIndexOutOfBoundsException: 0
   at pdftk.com.lowagie.text.pdf.DocumentFont.fillEncoding(pdftk)
   at pdftk.com.lowagie.text.pdf.DocumentFont.doType1TT(pdftk)
   at pdftk.com.lowagie.text.pdf.DocumentFont.<init>(pdftk)
   at pdftk.com.lowagie.text.pdf.AcroFields.getAppearance(pdftk)
   at pdftk.com.lowagie.text.pdf.AcroFields.setField(pdftk)
   at pdftk.com.lowagie.text.pdf.AcroFields.setFields(pdftk)

======================================================================
ERROR: test_fill_listbox (tests.integration.test_pdftk.TestFields)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/bgolder/projects/pdf/pdfhook/tests/integration/test_pdftk.py", line 99, in test_fill_listbox
    filled_pdf = pdftk.fill_pdf(path, sample_answers)
  File "/Users/bgolder/projects/pdf/pdfhook/src/pdftk_wrapper.py", line 255, in fill_pdf
    pdf_path, patched_fdf_str)
  File "/Users/bgolder/projects/pdf/pdfhook/src/pdftk_wrapper.py", line 245, in _load_patched_fdf_into_pdf
    'output', tmp_pdf_path
  File "/Users/bgolder/projects/pdf/pdfhook/src/pdftk_wrapper.py", line 83, in run_command
    raise PdftkError(err.decode('utf-8'))
src.pdftk_wrapper.PdftkError: Unhandled Java Exception in create_output():
java.lang.ArrayIndexOutOfBoundsException: 0
   at pdftk.com.lowagie.text.pdf.DocumentFont.fillEncoding(pdftk)
   at pdftk.com.lowagie.text.pdf.DocumentFont.doType1TT(pdftk)
   at pdftk.com.lowagie.text.pdf.DocumentFont.<init>(pdftk)
   at pdftk.com.lowagie.text.pdf.AcroFields.getAppearance(pdftk)
   at pdftk.com.lowagie.text.pdf.AcroFields.setField(pdftk)
   at pdftk.com.lowagie.text.pdf.AcroFields.setFields(pdftk)

Travis CI does not seem to encounter these same errors, despite running the same tests with the same input pdfs and answer data. I'm considering removing the listbox and dropdown tests for the time being, and trying to address these errors on a separate branch.

bengolder commented 8 years ago

It looks like Text fields are also behaving differently on Travis. I broke off a separate PR & branch to address the Choice field type in #44. The failing Text field test on remote seems to be due to the fact that the filled pdf has differing content from the filled pdf I generated on my local machine. This could perhaps be solved by using a less-fragile test. Instead of comparing the full bytestring contents of the file, we could assume that generating the filled pdf causes no errors, and compare key contents within the bytestring of the pdf.

======================================================================
FAIL: test_fill_text (tests.integration.test_pdftk.TestFields)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/codeforamerica/pdfhook/tests/integration/test_pdftk.py", line 101, in test_fill_text
    self.assertEqual(filled_pdf, filled_sample)
AssertionError: b'%PD[1302845 chars]j \n21 0 obj \n<<\n/FormType 1\n/Subtype /Form[66704 chars]OF\n' != b'%PD[1302845 chars]j \n20 0 obj \n<<\n/Filter /FlateDecode\n/Leng[66213 chars]OF\n'
bengolder commented 8 years ago

The test for filling the text field now searches the filled pdf to ensure that the answers are inside of it, rather than trying to completely match the binary content of a sample filled pdf.

bengolder commented 8 years ago

Blank pdf form created in Acrobat screen shot 2016-03-14 at 9 01 42 pm

Extracted form field specs

[{'name': 'Address City', 'type': 'text'},
 {'name': 'Address State', 'type': 'text'},
 {'name': 'Address Street', 'type': 'text'},
 {'name': 'Address Zip', 'type': 'text'},
 {'name': 'Arrested outside SF',
  'options': ['No', 'Off', 'Yes'],
  'type': 'button'},
 {'name': 'Cell phone number', 'type': 'text'},
 {'name': 'Charged with a crime',
  'options': ['No', 'Off', 'Yes'],
  'type': 'button'},
 {'name': 'Date', 'type': 'text'},
 {'name': 'Date of Birth', 'type': 'text'},
 {'name': 'Dates arrested outside SF', 'type': 'text'},
 {'name': 'Drivers License', 'type': 'text'},
 {'name': 'Email Address', 'type': 'text'},
 {'name': 'Employed', 'options': ['No', 'Off', 'Yes'], 'type': 'button'},
 {'name': 'First Name', 'type': 'text'},
 {'name': 'Home phone number', 'type': 'text'},
 {'name': 'How did you hear about the Clean Slate Program', 'type': 'text'},
 {'name': 'If probation where and when?', 'type': 'text'},
 {'name': 'Last Name', 'type': 'text'},
 {'name': 'MI', 'type': 'text'},
 {'name': 'May we leave voicemail',
  'options': ['No', 'Off', 'Yes'],
  'type': 'button'},
 {'name': 'May we send mail here',
  'options': ['No', 'Off', 'Yes'],
  'type': 'button'},
 {'name': 'Monthly expenses', 'type': 'text'},
 {'name': 'On probation or parole',
  'options': ['No', 'Off', 'Yes'],
  'type': 'button'},
 {'name': 'Other phone number', 'type': 'text'},
 {'name': 'Serving a sentence',
  'options': ['No', 'Off', 'Yes'],
  'type': 'button'},
 {'name': 'Social Security Number', 'type': 'text'},
 {'name': 'US Citizen', 'options': ['No', 'Off', 'Yes'], 'type': 'button'},
 {'name': 'What is your monthly income', 'type': 'text'},
 {'name': 'Work phone number', 'type': 'text'}]

Answer submission

{
            'Address City': 'Little Town',
            'Address State': 'CA',
            'Address Street': '111 Main Street',
            'Address Zip': '01092',
            'Arrested outside SF': 'No',
            'Cell phone number': '999-999-9999',
            'Charged with a crime': 'No',
            'Date': '09/09/2016',
            'Date of Birth': '09/09/9999',
            'Dates arrested outside SF': '',
            'Drivers License': 'D9999999',
            'Email Address': 'berry.happy.manatee@gmail.com',
            'Employed': 'No',
            'First Name': 'Berry',
            'Home phone number': '',
            'How did you hear about the Clean Slate Program':
                'From a wonderful friend',
            'If probation where and when?': '',
            'Last Name': 'Manatee',
            'MI': 'H',
            'May we leave voicemail': 'Yes',
            'May we send mail here': 'Yes',
            'Monthly expenses': '1000',
            'On probation or parole': 'No',
            'Other phone number': '',
            'Serving a sentence': 'No',
            'Social Security Number': '999-99-9999',
            'US Citizen': 'Yes',
            'What is your monthly income': '0',
            'Work phone number': '',
        }

Filled pdf screen shot 2016-03-14 at 9 01 50 pm