All Your Forms Are Belong To Us

alanjosephwilliams commented 10 years ago

BLUF:

In the course of our work on Clean we have started thinking more generically about the process of taking key information about an individual through a web form, and then using that data to populate one or more existing paper/PDF forms. The idea being—if we map the location of the "first name" field on all of these forms, we could have data submitted once, and written to many forms using PDFtk.

So. how about we collect all PDF forms that one could potentially need to fill out with that personal information. In other words, let's try to collect all government PDF/Paper forms available on the internet. Maybe we could start with a single state, like California. Let's learn whether obvious taxonomies for forms already exist, or whether we could craft some lightweight categories.

This idea is inspired by the spirit and tactical approach of OpenAddresses. OpenAddresses collects address data in any format as long as it has a stable URL on the web. The URL is housed within a JSON blob containing other metadata such as its origins, the relevant location, and the type of data. Anybody can submit a link to OpenAddresses. The project leads then write handlers to covert the diverse data into a common format and schema for use.

A bit of a h/t to @daguar on this idea. I've asked him to come in and edit the body of this idea to flush out the approach a bit more. @lippytak was involved too, he might also have color to add.

allyourbase

davidrleonard commented 9 years ago

This is one of those ideas that are equally mad and intriguing. I would love to see an MVP of this.

rcackermanCC commented 9 years ago

@alanjosephwilliams I can help spelunk! You have a repo or dropbox folder for this?

daguar commented 9 years ago

Code for DC's "District Housing" project ( https://github.com/codefordc/districthousing ) takes a similar approach to this, specifically for Section 8-eligible housing applications. cc @jrunningen @jposi who seem to be active in that project.

greggish commented 9 years ago

cc @mlouie

jrunningen commented 9 years ago

I can give a synopsis of how the District Housing Rails app does it.

We have a database schema that models the information. The structure of District Housing's web form very closely matches the structure of this schema.

We have a standard naming convention for PDF field names. If you're editing a PDF with Acrobat, and name the fillable fields according to this standard, then District Housing can compute the information that belongs in that field from the contents of the database.

The code doing the translation from the database to PDF can be found in the #value_for_field methods of our various models. It's generally a giant case statement that uses string and regex matching to find the right answer, or delegate it to another model. For example, see the Person model.

fureigh commented 9 years ago

The District Housing standard naming convention for PDF field names fills me with hope for a better future! The field names in PDFs I've seen have generally been pretty bizarre/arbitrary.

If we're trying to collect a (likely enormous) existing set of PDFs for analysis, how about writing a scraper that grabs all fillable PDFs from .gov sites and maybe also records their field names?

slooker commented 9 years ago

We did something like this for an MVP of Parks and Rec in the Greater Las Vegas area (because we have 4 parks and rec departments). Sadly, most of the forms weren't editable, so we converted the pdfs to images, mapped where each field needed to be filled in and used ImageMagick to add the text where it needed to be, then let people print them out, since said agencies required them printed, not online.

It's definitely more work per form, but doesn't require any assistance from agencies that may not be willing, and doesn't limit you to only editable pdfs.

Note, this is a suggestion in addition to, not in place of the current ones...

bengolder commented 8 years ago

@daguar and I have been discussing a related MVP that would be useful in a current CfA project.

It would be a redeployable web app that would enable the following set of actions:

You send a filled PDF form to a post URL for the app.
- the app would save the pdf form, scan it for interactive form fields, and use the values of text fields as keys for those same fields.
- The app would then create a URL specific to this pdf form that takes POST requests.
You can then send POST data to the form-specific URL, and the app would reply with a filled copy of the PDF.

daguar commented 5 years ago

https://www.pdffiller.com/ did this

bengolder commented 5 years ago

@daguar your comment just reminded me that I ended up making this as a little open source experiment: https://pdfhook.herokuapp.com

codeforamerica / project-ideas

All Your Forms Are Belong To Us #44

BLUF: