Closed andi-dev closed 1 year ago
Hmm... I concur that file is rather strange.
As for your comments:
Yes, the validation code for the main form object now contains provisions to remove invalid objects from the field tree. So that is the expected behaviour.
The #find_root_fields!
method, as the bang indicates, modifies the form itself by setting self[:Fields]
. Calling this method should not be necessary in normal usage, neither for an existing file nor for one created by HexaPDF. If called, it will go through all pages and collect the fields referenced by all widget annotations.
It's good to know that the issue can be resolved, let's see what I find.
Okay, so running hexapdf info --check test3.pdf
shows some problems and looking e.g. at file position 477531 you can see 0 0 R
as the value of a dictionary key. The PDF spec says regarding indirect object references in 7.3.10 "The object identifier shall consist of two parts: A positive integer object number. ...", so this is clearly invalid.
There was a recent change where handling of invalid references was corrected. So those errors might not show up in older versions of HexaPDF or different errors might show up.
I inspected a few other error positions and they all show the same problem with 0 0 R
.
So this is clearly something invalid but leads in this case to some object not being parsed at all, i.e. to a much bigger problem. I think the best way forward would be to treat references with an object number of 0 as null values. I tested this out and the fields don't disappear anymore.
Btw. the second file you sent exhibits the same problem with 0 0 R
but only in one location, not in multiple like with test3.pdf
.
The change fixing the problem is live on the devel branch.
As for the problem with filled out text not showing up: I think this is related to the fonts that are used in some of the form fields because they are subset fonts, i.e. not containing all characters or all the mappings needed to actually create a visual representation of some Unicode text. Some of those form fields, e.g. the three on the right side of "Änderungsstichtag ab", are actually filled out but nothing is shown in Okular and Evince.
If you add doc.acro_form.create_appearances(force: true)
all field appearances are recreated and in case of subset fonts which are not supported by HexaPDF the fallback fonts are used. And then the text shows up (making changes in Okular is still not possible because it used the subset font for this).
So this is nothing that HexaPDF itself can really fix.
Nice, I can confirm the issue is fixed :)
Hi Thomas,
okay, I have an odd one this time. I will send you the corresponding pdf in a moment via email.
When I open the file with hexapdf and
write
it again, a ticked checkbox disappears, and apparently the form-field / widget as well, as the check_box is no longer clickable.I don't understand whats wrong, but here are a couple of things I noticed:
acro_form.validate
and out put the messages there are twelve timesInvalid object in AcroForm field hierarchy
. If I interpret the code correctly this error is auto-corrected by simply skipping those "fields".The "fields" (returned by
root_fields
in this case arenil
-values, but I don't really understand how these field values end up inself[:Fields]
.The method
find_root_fields!
doesn't seem to be called automatically. If I call it manually, the value of:Fields
/root_fields
changes: instead of references it directly contains annotations, similar to when I callself[:Fields].to_a
- only it no longer containsnil
-values.(However, simply calling
find_root_field!
before writing out the document doesn't solve the issue)I am very curious if you have any idea whats happening :)