chinapandaman / PyPDFForm

:fire: The Python library for PDF forms.
https://chinapandaman.github.io/PyPDFForm/
MIT License
394 stars 16 forks source link

Two Bugs with Text Reflowing #521

Closed ebardelli closed 6 months ago

ebardelli commented 7 months ago

Hi,

I noticed two bugs with how the text reflowing is working in the latest release.

I attached a minimal example to reproduce the issue:

example.pdf output.pdf

You can run those with this code:

from PyPDFForm import PdfWrapper

filled = PdfWrapper("example.pdf").fill(
    {
        "Text1": "Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur? Quis autem vel eum iure reprehenderit qui in ea voluptate velit esse quam nihil molestiae consequatur, vel illum qui dolorem eum fugiat quo voluptas nulla pariatur?",
        "Text2": "Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. NEMO ENIM IPSAM VOLUPTATEM QUIA VOLUPTAS SIT ASPERNATUR AUT ODIT AUT FUGIT, SED QUIA CONSEQUUNTUR MAGNI DOLORES EOS QUI RATIONE VOLUPTATEM SEQUI NESCIUNT. NEQUE PORRO QUISQUAM EST, QUI DOLOREM IPSUM QUIA DOLOR SIT AMET, CONSECTETUR, ADIPISCI VELIT, SED QUIA NON NUMQUAM EIUS MODI TEMPORA INCIDUNT UT LABORE ET DOLORE MAGNAM ALIQUAM QUAERAT VOLUPTATEM. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur? Quis autem vel eum iure reprehenderit qui in ea voluptate velit esse quam nihil molestiae consequatur, vel illum qui dolorem eum fugiat quo voluptas nulla pariatur?",
        "Text3": "Sed ut perspiciatis unde omnis iste natus error sit voluptatem accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae ab illo inventore veritatis et quasi architecto beatae vitae dicta sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed quia non numquam eius modi tempora incidunt ut labore et dolore magnam aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex ea commodi consequatur? Quis autem vel eum iure reprehenderit qui in ea voluptate velit esse quam nihil molestiae consequatur, vel illum qui dolorem eum fugiat quo voluptas nulla pariatur?",
    },
)

with open("output.pdf", "wb+") as output:
    output.write(filled.read())

Issue 1: Mixing uppercase sententences with lowercase sentences makes the text reflow utility not work as expected.

Issue 2: The text box size sometimes conflicts with the decisions that the text reflower made, breaking up words in the middle or forcing new line breaks randomly in the text. This is difficult to explain but the third box shows what is happening. The word at the end of the second sentence should be "ab" but "a" and "b" are split across two lines. Similarly, the "?" on the fourth to last line should remain with the word before and there shouldn't be a line break after it.

chinapandaman commented 7 months ago

Thanks for bringing this up. I may not have time to look at this issue recently but I'm currently working on something that may resolve your issue indirectly. Let me get back to you later.

chinapandaman commented 7 months ago

Hey, sorry for a rather late response. I have been quite busy.

With v1.4.13, I have added (or I should really say added back) FormWrapper to the library. Although it won't directly address your issues through PdfWrapper it does allow you to fill a form in place as if you were filling it manually. I tried the form you linked in this thread and it came out pretty good.

Here is the doc for it. Give it a try and let me know what you think. Also if you still want me to look into these issues in PdfWrapper I'm more than happy to, just won't happen pretty soon because I have been quite busy with life and stuffs. Also if you could tell me approximately at which version this started breaking that would help me a lot in triage this.

ebardelli commented 7 months ago

Thank you for the response. No worries about the timeline. I'm still using an old version from December that works. I can also take a look at the code and submit a pull request. I have some time next week to work on this.

The FormWrapper idea is interesting. Do you know if it would be possible to flatten the pdfs after using it?

chinapandaman commented 6 months ago

In v1.4.14 you can flatten a PDF form when filling it using FormWrapper, described in that same doc I linked you earlier.

I also took a look at the release history and it looks like the only two releases in December were v1.3.4 and v1.3.5. I will try to start taking a look at the diffs between now and these two releases later this week.

ebardelli commented 6 months ago

I see.

When I think about flattening a pdf, I think of a pdf without forms and text embedded into the file, which is the behavior for FormWrapper.

What FormWrapper does is to make a form non-editable (possibly by switching the edit bit in the pdf? I haven't looked at the code). This is similar to other python packages (e.g., pypdf and pdftools), but then it's difficult to go from the form to a flat pdf file in pure python.

ebardelli commented 6 months ago

I noticed that the new release seems to fix these bugs. Thank you!!!

chinapandaman commented 6 months ago

Haven’t released yet. Seems like you rebased your fork? Glad it worked out :D

ebardelli commented 6 months ago

I did. I was going to work on it this morning and then I realized that the comments were just working :)

Thank you again!