baruchel / txt2pdf

Text to PDF converter with Unicode support
MIT License
75 stars 46 forks source link

Added form feed character handling. #7

Closed srollyson closed 6 years ago

srollyson commented 6 years ago

Had some text file input with form feed characters embedded and added some code to handle them and an option to configure how they're handled. Setting the option to 0 means every "\f" will result in a new page and setting it to a number higher than the lines per page calculation means the "\f" characters will be ignored.

Either way, the "\f" characters will be removed from the output meaning that the PDF won't include unrenderable blocks where they would be.

baruchel commented 6 years ago

Hi @srollyson Thank you for your commit. Wouldn't it better to set the default value to some very high value (like 1000 or whatever) in order to actually disable this system by default. I can figure out cases where the user doesn't want at all this behaviour (for instance if I edit markdown and want to embed a snippet of LaTeX code containing some command beginning with \f)?

srollyson commented 6 years ago

Hey @baruchel, this is actually removing the character 0xC and inserting a new page for each instance it finds. If someone actually has the string "\f" in their text input, it won't be affected by this code.

For example, this text input would be included as-is in the PDF output regardless of the value of that option:

Here is a line of text.
\f
Here is another line of text.

The use case is actually output from xml2rfc. Here's an example input file to show what's going on: rfc7511.txt

The above example file has page feeds every 56 lines, while the default media, "A4", along with the other defaults has a lines per page count of 60.

Here's what the output looks like prior to the pull request: before_pull_request.pdf

And the output after the pull request: after_pull_request.pdf

baruchel commented 6 years ago

Understood! I merge.