danfickle / openhtmltopdf

An HTML to PDF library for the JVM. Based on Flying Saucer and Apache PDF-BOX 2. With SVG image support. Now also with accessible PDF support (WCAG, Section 508, PDF/UA)!
https://danfickle.github.io/pdf-templates/index.html
Other
1.93k stars 359 forks source link

Add support for fillable form elements (basic features) #24

Closed ngns closed 8 years ago

ngns commented 8 years ago

This issues is extended with "basic features", as more advanced features and elements (like drop-downs or definition of text content types) are covered in a separate issue

As in the current flying saucer version, input elements should not only be rendered, but also be created as AcroForm elements in the resulting PDF (note that the implementation in flying saucer has some limitations though - e.g. radioboxes are not yet implemented, and css styles are a bit quirky still). In a first implementation, the following elements should be considered:

For all the above elements, basic styling for borders (width, color), background-color, inner padding to content and font settings should be applied (for checkboxes and radio buttons, the dot/check sign is rendered in the font-color - so even there it does count)

ngns commented 8 years ago

In order to provide some "easier" styling of the radio and checkbox elements in the pdf (there are some "flat" and "3d" defaults), special css styles may be considered. E.g.: --pdfbox-form-style: embossed; or --pdfbox-form-style: flat;. Community or maintainer feedback on this suggestions is appreciated :-) !

scoldwell commented 8 years ago

@danfickle do you have a time frame in which this will be implemented? Also I see you tagged the issue with pdfbox-work-needed. Is there a pdfbox jira issue I can keep track of regarding this work?

danfickle commented 8 years ago

In the pdf specification form fields can have an explicit appearance or a flag to tell the viewer to generate its own appearance. Ideally we would want to generate our own appearance in line with css. Unfortunately pdf box does not yet have a way to draw to appearance streams. However, work is advanced and is covered by pdf box 3353

I was planning on waiting until this landed before working on this issue but if you can handle a viewer specified appearance I can start on this issue.

Also see http://www.debenu.com/kb/appearance-streams-pdf-form-fields/ @scoldwell

scoldwell commented 8 years ago

Just so I understand, is it currently not possible to have this html:

<input type="text" name="fieldname" />

Translate into something like this?

http://svn.apache.org/viewvc/pdfbox/tags/2.0.2/examples/src/main/java/org/apache/pdfbox/examples/interactive/form/CreateSimpleForm.java?view=markup

danfickle commented 8 years ago

Yes, I remembered it as more complicated than it is. Thanks for finding an example.

danfickle commented 8 years ago

We can't use custom fonts in text fields until complete font embedding is fixed in PDF-BOX.

danfickle commented 8 years ago

OK, support for text fields (text, textarea, password) and submit inputs or buttons has landed.

Limitations:

Attributes supported:

Styles supported:

Examples:

<html>
<head>
<style>
input { color: orange; font-family: monospace; }
textarea { color: red; font-family: monospace;  }
button { color: blue; font-family: monospace; border-radius: 8px; background-color: yellow; }
</style>
</head>
<body>
<div>
<form action="http://localhost/form.php" method="POST">
 <input type="text" value="One" name="test1"   readonly="" title="Hello there!"/>
 <input type="text" value="Two" name="test2" max-length="4" />
 <input type="text" value="Three" name="test3" required=""/>
  <input type="text" value="Four" name="test4"/>
 <div><textarea name="text-area-test" rows="30" cols="50">This is a test!</textarea></div> 
 <input type="submit" value="GO!"/>
 <button type="submit">SUBMIT</button>
 <input type="password" name="notvery" value="secret"/>
</form>
</div>
</body>
</html>

Results in: form-screenshot

Note that Acrobat masks the password field, Mac Preview does not. Also note, that you can have multiple forms in a document and it will behave correctly.

And when submitted with Acrobat sends this to the server: test1=One&test2=Two&test3=Three&test4=Four&text-area-test=This+is+a+test%21&notvery=secret

Next up: Select fields.

danfickle commented 8 years ago

OK, select (without the multiple attribute) and reset controls are implemented. After implementing multiple select, I'll work on checkboxes and radio boxes. Does anyone have a strong opinion on how these should look? Should we break the browser standard and use the text color and font size for icons? Or should we just copy the (unchangeable via CSS) styling of Chrome?

Any feedback welcome.

ngns commented 8 years ago

Checkboxes and Radiobutton groups are kind of a b**ch in the Acrobat format. There is not a consistent and proper way of designing these:

For sake of a balanced compromise between flexibility, compatibility and implementation effort I would opt for sticking to what Adobe Acrobat allows natively (The "static" Chrome-way is not at all usable if you want at least some possibility to make your PDF look nice or stick to some company's CI/CD). We may need to introduce some custom CSS styles, though (e.g. something like --pdf-checkbox-type: cross; --pdf-checkbox-background:#CCC; -- pdf-checkbox-border: thin;), in combination with the standard tags for width and height

What do you think?

danfickle commented 8 years ago

Thanks for the info. It really helped as none of this is mentioned in the PDF specification. As you say, checkboxes seem to be more than a bit of a mess. Here is what I've learned:

In conclusion, I think we need to implement both a caption (for acrobat) and an appearance stream in the PDF to be safe. Since there are many similar icons in zapf dingbats I need to know the exact indices of icons, which brings we to my plea:

@ngns - Would you be generous enough to upload a PDF here (drag it on to the comment box to upload) with all the styles you want supported? Alternatively, you could examine it with a text editor and post all the /CA entries.

A /CA entry will look something like with the number in the bracket being what we need:

<< /CA (0) >>

Thanks again.

http://stackoverflow.com/questions/15479855/pdf-appearance-streams-checkbox-not-shown-correctly-after-focus-lost http://developers.itextpdf.com/question/why-does-itext-enter-cross-symbol-when-checktype-style-check-mark

danfickle commented 8 years ago
<html>
<head></head>
<body>
<form method="post" action="http://localhost/form.php">
 <div>
 <input type="checkbox" name="checker" value="1" checked="" style="-fs-checkbox-style: check;" />
 <input type="checkbox" name="checker" value="2" checked="" style="-fs-checkbox-style: square;" />
 <input type="checkbox" name="checker" value="3" checked="" style="-fs-checkbox-style: diamond;" />
 <input type="checkbox" name="checker" value="4" checked="" style="-fs-checkbox-style: star;" />
 <input type="checkbox" name="checker" value="5" checked="" style="-fs-checkbox-style: circle;" />
 <input type="checkbox" name="checker" value="6" checked="" style="-fs-checkbox-style: cross;" />
</div>
</form>
</body>
</html>

will now produce: checkbox-screenshot However, due to PDF-BOX 3298 I had to upgrade to PDF-BOX 2.0.3-SNAPSHOT to get checkbox styles working. So we'll have to wait for a PDF-BOX release before releasing ourselves. Custom fonts for text controls are also working with 2.0.3.

Next up are radio buttons and hidden inputs.

scoldwell commented 8 years ago

@danfickle Pdfbox 2.0.3 has been released now

danfickle commented 8 years ago

@scoldwell - Thanks for the notification, about to upgrade to 2.0.3 and do a release now.

I just landed radio controls after a frustrating couple of weeks trying to track down a bug that I now think is in Acrobat Reader rather than my code.

If you have:

 <input type="radio" name="rad" value="one"/>
 <input type="radio" name="rad" value="two"/>
 <input type="radio" name="rad" value="three"/>
 <input type="radio" name="rad" value="four"/>

The first two when submitted will have the correct value such as rad=one. However, after that three and four will return the zero based index such as rad=3. A workaround is to always use a zero based index for values:

 <input type="radio" name="rad" value="0"/>
 <input type="radio" name="rad" value="1"/>
 <input type="radio" name="rad" value="2"/>
 <input type="radio" name="rad" value="3"/>

Another problem is that the default value is not honored in Acrobat Reader, but it is in Mac Preview. For example, using the below markup, Acrobat will have all buttons empty.

<input type="radio" name="rad-two" value="0" checked=""/>
<input type="radio" name="rad-two" value="1"/>

The only form controls left are hidden, and they'll have to wait for the next release.

scoldwell commented 8 years ago

@danfickle I'm finally getting around to testing this with our templates (we were in the middle of a huge platform upgrade). I'm running into an issue with the name of the text fields in the html not being preserved in the PDF. For example:

<p><input name="signature.employee" type="text" size="200" /></p>

is coming out in the PDF as "OpenHTMLCtrl1"

Looking at the code, I see you're setting the mapping name of PDField to what's in the html, but using "OpenHTMLCtrl" + i to set the partial name. The problem with this is:

We use org.apache.pdfbox.pdmodel.interactive.form.PDAcroForm.getField(String) to locate a field by its name, this fails because the PDField mapping name is not used for this lookup.

scoldwell commented 8 years ago

Opened #42 and #41 for the above problem as well as an enhancement request.

danfickle commented 8 years ago

Please see #42 for new restrictions on naming form controls.

I think this just about finishes form support. The only other thing we could do is add support for all the other input controls (color, date, etc) introduced in HTML5. It would be trivial to alias them as text controls, but I'm not sure fake support is the way to go.

If anyone has any suggestions please make them, otherwise I'll close this issue shortly.

Thanks.