Open az143 opened 10 months ago
Thank you for the report. I will take a look at it over the next day or two, and at least confirm what you've found and whether it was any sort of usage error. I see you have attached some .pl
code to demonstrate the problem (I find it clearer to suffix the program with .txt
for uploading, rather than replacing .pl
with .txt
-- I almost went and asked you to supply the .pl code).
I can confirm that it's not working correctly (i.e., a bug). After a careful reading of the code and some experimenting, the first thing I want to do is update the POD to show how to do multi-level (nested) bookmark lists, where $parent
is given in the outline()
call. It does appear to work; it's just undocumented. Then I want to figure out how to place a new bookmark at some other place than the end of the list ($prev
value given is apparently supposed to do this), which could be very useful if you're updating a bookmark list and want to insert a new bookmark in-between two existing ones. I may have to extend the code to handle inserting a new bookmark (Outline) before the existing first one. Then I have to confirm that open
and close
of a bookmark does the right thing for a Count
of descendants.
Once all that is working, I will have to look at what's going on in the simple update case you gave. I can already see that the Outlines
root is not updated when you add more top-level Outline
objects, so that's something to look at.
@vadim-160102, does this one sound related to your ssimms/pdfapi2/issues/83 (appended below:)? I have gotten stuck on this one, and hope you might have some insights!
I totally forgot about this.
use strict; use warnings; use feature 'say'; use utf8; use Test::More; use PDF::API2 2.047;
my $pdf;
########################################################
$pdf = PDF::API2-> new; $pdf-> page; $pdf-> outline-> outline for 1 .. 2; $pdf-> outline-> last-> delete; is $pdf-> outline-> count, 1, 'looks like we successfully deleted the last item...';
$pdf = PDF::API2-> from_string( $pdf-> to_string ); is $pdf-> outline-> count, 1, '... (save, re-open and check to be sure)';
########################################################
$pdf = PDF::API2-> new; $pdf-> page; $pdf-> outline-> outline-> title( 'bookmark' ); $pdf = PDF::API2-> from_string( $pdf-> to_string );
my $title = eval { $pdf-> outline-> first-> title }; is $@, '', 'we can examine bookmarks in existing file';
$pdf-> outline-> count; # fix for test #3
$title = $pdf-> outline-> first-> title; is $title, 'bookmark', 'yes we can';
########################################################
my $chinese = '中國的'; $pdf = PDF::API2-> new; $pdf-> page; $pdf-> outline-> outline-> title( $chinese ); $pdf = PDF::API2-> from_string( $pdf-> to_string );
$pdf-> outline-> count; # fix for test #3
is join( ' ', unpack 'U', $pdf-> outline-> first-> title ), join( ' ', unpack 'U', $chinese ), '...and read interesting characters';
########################################################
$pdf = PDF::API2-> new; $pdf-> page; $pdf-> outline-> outline-> title( $_ ) for qw/ a B c D e F /; $pdf = PDF::API2-> from_string( $pdf-> to_string );
$pdf-> outline-> count; # fix for test #3
my $bm = $pdf-> outline-> first; while ( $bm ) { my $next = $bm-> next; $bm-> delete if $bm-> title =~ /\p{Lu}/; $bm = $next } $pdf = PDF::API2-> from_string( $pdf-> to_string );
is $pdf-> outline-> count, 3, 'we can modify outline in existing file and save changes';
########################################################
done_testing;
The output:
ok 1 - looks like we successfully deleted the last item... not ok 2 - ... (save, re-open and check to be sure)
not ok 3 - we can examine bookmarks in existing file
ok 4 - yes we can not ok 5 - ...and read interesting characters
not ok 6 - we can modify outline in existing file and save changes
1..6
2 files to patch:
--- PDF/API2_orig.pm Sat May 18 21:19:41 2024 +++ PDF/API2.pm Sat Jun 22 11:28:39 2024 @@ -833,6 +833,7 @@ bless $obj, 'PDF::API2::Outlines'; $obj->{' api'} = $self; weaken $obj->{' api'};
else {
delete $self->{'Count'}
} return $count; }
sub _load_children { my $self = shift(); my $item = $self->{'First'};
return unless $item;
$item->realise();
bless $item, PACKAGE;
push @{$self->{' children'}}, $item;
while ($item->next()) {
$item = $item->next();
while ($item) { $item->realise(); bless $item, PACKAGE;
$item->{' api'} = $self->{' api'}; push @{$self->{' children'}}, $item;
}
return $self; -}
$item = $item->next()
}
}
=head3 first
@@ -124,6 +122,9 @@ if (defined $self->{' children'} and defined $self->{' children'}->[0]) { $self->{'First'} = $self->{' children'}->[0]; }
@@ -140,6 +141,9 @@ if (defined $self->{' children'} and defined $self->{' children'}->[-1]) { $self->{'Last'} = $self->{' children'}->[-1]; }
@@ -154,7 +158,7 @@
sub parent { my $self = shift();
@@ -167,8 +171,11 @@ =cut
sub prev {
@@ -181,8 +188,11 @@ =cut
sub next {
@@ -208,6 +218,7 @@ $self->{' api'}->{'pdf'}->new_obj($child); }
@@ -268,6 +279,7 @@ $item = $item->next(); push @{$self->{' children'}}, $item; }
@@ -286,12 +298,20 @@
my $prev = $self->prev();
my $next = $self->next();
$prev->next($next) if defined $prev;
$next->prev($prev) if defined $next;
if (defined $prev and defined $next) {
$prev->next($next);
$next->prev($prev);
}
else {
delete $prev->{'Next'} if defined $prev;
delete $next->{'Prev'} if defined $next;
}
my $siblings = $self->parent->{' children'}; @$siblings = grep { $_ ne $self } @$siblings; delete $self->parent->{' children'} unless $self->parent->has_children();
$self->{' api'}{'pdf'}->out_obj($self->parent);
return; } @@ -326,6 +346,7 @@
my $is_open = shift(); $self->{' closed'} = (not $is_open);
$self->{' api'}{'pdf'}->out_obj($self);
return $self; } @@ -362,12 +383,17 @@
unless (@_) { return unless $self->{'Title'};
return $self->{'Title'}->val();
my $s = $self->{'Title'}->val();
if ( $s =~ s/^\x{fe}\x{ff}// ) {
$s = Encode::decode( 'UTF-16BE', $s )
}
return $s; }
my $text = shift(); $self->{'Title'} = PDFStr($text);
$self->{' api'}{'pdf'}->out_obj($self); return $self; }
@@ -403,6 +429,7 @@ $self->{'Dest'} = PDFStr($destination); }
@@ -443,6 +470,7 @@ $self->{'A'}->{'S'} = PDFName('URI'); $self->{'A'}->{'URI'} = PDFStr($uri);
@@ -465,6 +493,7 @@ $self->{'A'}->{'S'} = PDFName('Launch'); $self->{'A'}->{'F'} = PDFStr($file);
@@ -511,6 +540,7 @@
$self->{'A'}->{'D'} = _destination(PDFNum($page_number), $location, @args);
Lines marked '`fix for test #3`' in the script are no longer required and can be deleted, and:
ok 1 - looks like we successfully deleted the last item... ok 2 - ... (save, re-open and check to be sure) ok 3 - we can examine bookmarks in existing file ok 4 - yes we can ok 5 - ...and read interesting characters ok 6 - we can modify outline in existing file and save changes 1..6
(and `outline.t` passes OK)
I understand the PDF::API2::Outline::count()
does not "count" descendants/children, but its use is OK for these tests. Further,
it may seem that tests 1-2 are superfluous and their case is covered in later tests, but that is not exactly true. The later tests
deal only with accessing outline in existing (i.e. opened with PDF::API2
) files. The 1st test also produces invalid PDF syntax, i.e.
broken outline tree, in freshly generated file. The same happens when deleting the first outline item, but then the "phantom"
bookmark is somehow "invisible" i.e. ignored by Reader, PDF::API2
, etc. The new first/last item still have their prev/next attribute
set and pointing at phantom (deleted) item. (Perhaps "Outline tree" has unnecessary redundancy and is never checked in full by
readers/browsers?) As to "why would anyone want to generate a fresh tree and immediately delete the first/last item" -- I don't
know.
Test #3
fails because objects aren't blessed into correct class. Effectively, the PDF::API2::Outline::count()
will read the entire
tree and re-bless (therefore test #2
did not result in fatal error); this call is injected in the above script to enable tests 4-6; and is
later used in the patch, seemingly as a no-op.
I'll further comment on changes if necessary. Some were discovered only later when fixing and testing for most obvious ones.
Maybe I missed something.
Edit: I think I missed something indeed! (Rubber duck strikes again.) The count()
can't reliably be used to re-bless because
it doesn't follow items which are "closed". Perhaps dedicated internal traversing sub will be required.
Edit 2 : So, then, the sub below should be added/appended to PDF/API2/Outline.pm
; and line patched in PDF/API2.pm
should be a call to this sub i.e. $obj->_fix_on_open()
;
sub _fix_on_open {
my $self = shift();
if ($self->has_children()) {
$self->_load_children() unless exists $self->{' children'};
foreach my $child (@{$self->{' children'}}) {
$child->_fix_on_open();
}
}
}
Both the public interface and its documentation are sufficient to read/write outline trees, not sure why you think otherwise. The patch should fix bugs as reported here in the ticket, it's easily verifiable and not insightful neither. But looking at "055_outlines.sample_55-okcount.pdf"
, how about this for an insight: the PDF::API2::open_page()
call is surprisingly intrusive, and even destructive.
I think one can reasonably expect that open_page()
as argument for PDF::API2::Outline::outline()
would only be used to generate an indirect object reference like 12 0 R
, and it is read-only operation. It is not.
Another example: suppose I want to add a page to an existing PDF document. I'd like to know the media size of pages already present there. Can't think of anything else but e.g. $pdf->open_page(1)->size()
(or whatever moniker the latter method goes by now) as officially recommended way to do that. This call should be read-only, right? Wrong. The saved document with new appended page reveals that the 1st page was also stuck after original %%EOF
. Not only that, but it was modified.
For grotesque exaggeration, try:
use strict;
use warnings;
use PDF::API2 2.047;
my $pdf = PDF::API2-> new( compress => 0 );
$pdf-> page-> graphics-> save-> restore;
for ( 1 .. 10 ) {
$pdf = PDF::API2-> from_string( $pdf-> to_string, compress => 0 );
$pdf-> open_page( 1 ) # no-op? nope
}
$pdf-> save( 'visual_test.pdf' );
and inspect the final file in your text editor which has "show non-printing characters" on for full (hilarious) effect. A call to open_page()
will:
Contents
was a stream, convert it to array of streams;graphics()
and text()
combined;If "array of streams" is the syntax choice of how PDF::API2
generates new content -- fine, it's perfectly OK. What I don't understand is why to touch streams already present and inject changes I didn't ask for. E.g. such as i.e. amongst them the nounrotate
no longer documented and therefore forcibly disabled.
I checked as far back as 0.4 version which was 20 years ago. Looks like the above was there from the very beginning. Perhaps too late to ask "why?". It won't be too difficult to fix to how I think it should behave. However, (a) leaving the benefit of a doubt, of course it is possible I don't get it right, there are valid reasons for everything; and (b) reaction if any arrives in a few months or years judging by past experience, and probably "no, thanks".
Uhm, OK, if that was not an insight, here's another very short and easy one. Though unrelated to PDF::API2::Outline
at all. The TrimBox
(same with Bleed-
and Art-
) is not an inheritable property. Please check the Reference. The PDF::API2::trimbox()
(deprecated/renamed but OK as quick referent here) writes something to a file which is just ignored by any compliant reader. There is no such thing as "global trimbox". Looks like it also was there for 20+ years.
Both the public interface and its documentation are sufficient to read/write outline trees, not sure why you think otherwise.
Uh, I don't. I did go through and update and clarify the PDF::Builder documentation a bit, particularly with regards to creating multi-level outlines. The problem reported in this ticket concerns adding new outlines (bookmarks) to an existing PDF, which doesn't work. Your API2 ticket sounded like it might be related, and I was hoping you'd have some insights into that. It would be great if your fix also works for this ticket. Unfortunately, I have to get 3.027 out the door before I can resume work, and try your fix.
As for the "intrusive/destructive" calls you reported, I will have to look at them carefully. I may be back with more questions. Updating a PDF with new content does seem to create unnecessary updates to existing content, but I want to understand it thoroughly before changing anything.
I'm surprised to hear about the Trim- Bleed-, and Art-Box issues. I'll have to review the standard and decide what (if anything) to do.
Sorry I wasn't clear: "easily verifiable" means of course I did check it, and ticket's code runs as expected with PDF::API2
if the patch is applied.
+I think PDF::API2
was right when it removed any mention of PDF::API2::Outline::new()
from its documentation; one of the positive developments. I.e. the constructor is not public and not supposed to be called by the end user.
+as kind of proactive answer to questions you haven't asked yet: the Reference is clear that consumers are free to modify Contents
structure i.e. to/from array or any number of items in said array on read/write. But I read it of course that such freedom is granted if content is modified or file is simply re-saved and everything regenerated from scratch (cleanoutput()
) which PDF::API2
can't do. And not that it means Contents
is bound to be modified by simple fact of getting a reference to page's object and then stuck after %%EOF
.
That's great news that your fix covers the problem reported at the top of this ticket, PDF::Builder # 207, as well as PDF::API2 # 83. At least, that's what it sounds like you're saying! After I get a few nagging items wrapped up for PDF::Builder, I'll try to slip in your fix and confirm that it fixes this one. That would be worth a few more days' delay. Thanks!
this looks like a reopen of #67, for pdf::builder 3.025 on perl 5.36.0.
symptom: when adding an outline to a document that already has any, the newly added ones vanish (i cannot find any visual evidence of their existence using xpdf and browser-based rendering). it gets weirder: depending on whether outlines->count is called or not it may report the expected grand total or not, and the resulting files differ in size.
to reproduce: i used sample_55.pdf and the 055_outlines example as shipped with pdf::builder, and split the outline creation into two phases: add one outline, save interim file, open that, add another two, save. final result: just one outline is visible.
the attached 055_outlines-modify-fails does not call outlines->count after reopening, and the final count reported is 2 (should be 3).
the attached 055_outlines-modify-fails-but-counts-ok does call outlines->count after reopening and the final count is now 3.
in both cases the final document only shows the very first outline. the file sizes differ, too. the test scripts are attached (renamed .txt to appease github), ditto resulting pdfs and the interim one.
please let me know if there is anything else you'd like me to do to pinpoint and fix this one.
055_outlines-modify-fails.txt 055_outlines-modify-fails-but-counts-ok.txt 055_outlines.sample_55-dudcount.pdf 055_outlines.sample_55-okcount.pdf 055_outlines.sample_55_oneoutline.pdf