Closed bengolder closed 8 years ago
Now that all these tests are in place for parsing the form, I tried to run it on a file that had a broader set of fields with some more variations
It worked waayy better than I expected. It looks like we are a very short distance a way from supporting many more field types! Check out the field data that was extracted:
https://gist.github.com/bengolder/5ddaa3fb553743b6ab70
here is the pdf used: field_type_survey.pdf
I've written down some observations about how it handles fields here: https://github.com/codeforamerica/pdfhook/issues/28
In terms of handling unicode, this is an interesting case. Note how it parsed "Jalapeño"
differently in the fdf vs the field data dump:
'Check Box4': {'FieldFlags': '0',
'FieldJustification': 'Left',
'FieldName': 'Check Box4',
'FieldStateOption': ['Jalapeño', 'Off'],
'FieldType': 'Button',
'FieldValue': 'Jalapeño',
'fdf': {'escaped_name': 'Check Box4',
'name': 'Check Box4',
'name_span': (68, 78),
'value_template': '/Jalape#f1o',
'value_template_span': (52, 63)}},
And though it had trouble with "Jalapeño"
, it seems to handle unicode better at other points:
'MötleyCrüe': {'FieldFlags': '0',
'FieldJustification': 'Left',
'FieldName': 'MötleyCrüe',
'FieldType': 'Text',
'FieldValue': 'Just another text field',
'fdf': {'escaped_name': 'MötleyCrüe',
'name': 'MötleyCrüe',
'name_span': (358, 368),
'value_template': '(Just another text field)',
'value_template_span': (328, 353)}},
'Text12': {'FieldFlags': '0',
'FieldJustification': 'Left',
'FieldName': 'Text12',
'FieldType': 'Text',
'FieldValue': '¡Ojalá!',
'fdf': {'escaped_name': 'Text12',
'name': 'Text12',
'name_span': (573, 579),
'value_template': '(¡Ojalá!)',
'value_template_span': (559, 568)}}
Check out this example of a set of radio options:
'Group9': {'FieldFlags': '49152',
'FieldJustification': 'Left',
'FieldName': 'Group9',
'FieldStateOption': ['0', '1', '2', 'Choice1', 'Choice2', 'Off'],
'FieldType': 'Button',
'FieldValue': '1',
'fdf': {'escaped_name': 'Group9',
'name': 'Group9',
'name_span': (279, 285),
'value_template': '(Choice1)',
'value_template_span': (265, 274)}},
And this example of a dropdown:
'Dropdown9': {'FieldFlags': '393216',
'FieldJustification': 'Left',
'FieldName': 'Dropdown9',
'FieldStateOption': ['apple',
'apricot',
'banana',
'cranberry',
'date',
'fig',
'grape',
'lime',
'mango',
'orange',
'peach',
'raspberry',
'tamarind'],
'FieldType': 'Choice',
'FieldValue': 'fig',
'FieldValueDefault': 'apple',
'fdf': {'escaped_name': 'Dropdown9',
'name': 'Dropdown9',
'name_span': (131, 140),
'value_template': '(fig)',
'value_template_span': (121, 126)}},
Merging. But there are more tests to come, and eventually tasks.py
will be deprecated.
This branch is a work in progress
Changes:
pfdtk
commandsmake test.unit
command