PhilterPaper / Perl-PDF-Builder

Extended version of the popular PDF::API2 Perl-based PDF library for creating, reading, and modifying PDF documents
https://www.catskilltech.com/FreeSW/product/PDF%2DBuilder/title/PDF%3A%3ABuilder/freeSW_full
Other
6 stars 7 forks source link

PageLabels do not support omitting legally-optional items #171

Closed hisdeedsaredust closed 2 years ago

hisdeedsaredust commented 2 years ago

If "-style" is omitted from pageLabel(), decimal numbering is applied, by adding "/S /D" to the appropriate dictionary. However, according to the PDF 1.7 Specification, the "/S" key is optional, and if you want to make pages without a label, you would need to omit this.

I'm scanning old manuals that have covers and section dividers that are unnumbered, and I would like them to remain that way in the PDF. The only way I can see to do that at the moment is to hack the "/S" keys out afterwards:

$pdf->pageLabel(  0, { -prefix => 'blank' } );
$pdf->pageLabel(  2, { -style => 'roman' });
$pdf->pageLabel( 10, { -prefix => 'blank' });
$pdf->pageLabel( 12, { -style => 'decimal' });
$pdf->pageLabel( 42, { -prefix => 'blank' });
$pdf->pageLabel( 44, { -start => 31, -style => 'decimal' });
$pdf->pageLabel(140, { -prefix => 'blank' } );

# Hack away at the completed PageLabels number tree
foreach my $e ($pdf->{'catalog'}->{'PageLabels'}->{'Nums'}->elements()) {
    if (defined($e->{'P'}) && $e->{'P'}->val() eq 'blank') {
        undef $e->{'S'};
        undef $e->{'P'};
    }
}

PDF::Builder 3.022 (as packaged in Fedora 34)

PhilterPaper commented 2 years ago

Thank you for the report. I will take a look into this soon.

Just to be sure I understand what you're looking for, it appears that you have a 141+ page document. You are looking for the page label (reader's slider thumb value) to show:

- pages 0-1         label blank
- pages 2-9         label i-viii
- pages 10-11       label blank
- pages 12-41       label 1-29
- pages 42-43       label blank
- pages 44-139      label 31-127
- pages 140+        label blank

Is this correct? And what are you actually getting: a decimal page number where you want nothing (a blank)? Or something else? If it's insisting on adding a numeric page, that doesn't sound desirable.

You are looking to have the thumb label actually empty rather than saying some text? I don't know if PDF allows an empty label... if it doesn't, I'm guessing that a ' ' might be acceptable. I'm thinking of something along the lines of $pdf->pageLabel( 0, { -style => 'blank' } ); to show a blank label (if possible, I haven't experimented yet). Or possibly a -style of 'ptext' and use the -prefix text (which may be empty)? Would that work well for you?

hisdeedsaredust commented 2 years ago

You've decoded the example exactly correctly. However, PDF::Builder changes a missing or invalid style to "decimal", producing incorrectly numbered pages. PDF 32000-1:2008 Table 159 shows the contents of the PageLabel dictionary. Both style ("/S") and prefix ("/P") are optional, and the table explains that the page label will be blank if both are absent. That is what my hack achieves (and works just fine in Evince).

PDF::Builder currently produces the following PageLabels number tree:

/PageLabels <<
/Nums [
0 << /P (blank) /S /D /St 1 >>
2 << /P () /S /r /St 1 >>
10 << /P (blank) /S /D /St 1 >>
12 << /P () /S /D /St 1 >>
42 << /P (blank) /S /D /St 1 >>
44 << /P () /S /D /St 31 >>
140 << /P (blank) /S /D /St 1 >>
] >>

but it ought to be able to produce this:

/PageLabels <<
/Nums [
0 << /St 1 >>
2 << /P () /S /r /St 1 >>
10 << /St 1 >>
12 << /P () /S /D /St 1 >>
42 << /St 1 >>
44 << /P () /S /D /St 31 >>
140 << /St 1 >>
] >>

(Even "/St" is optional, but I didn't bother eliding it.) You can test the PDF I've produced, if you like. It's the PDF available on this page, and it has that number tree just above.

https://vt100.net/manx/part/dec/ek-0la75-ug/

-style => 'blank' would be a good way to achieve this.

PhilterPaper commented 2 years ago

OK, so it does appear that currently there is unwanted output (decimal page), which I will fix in some manner. I will do something about a blank -style, in addition to the other item. This should be in the next release (date not yet set, but I would expect by the end of the year at latest).

PhilterPaper commented 2 years ago

OK, I have pushed a rewritten pageLabel() function (in lib/PDF/Builder.pm) to GitHub. Please give it a try and see if it does what you need. The documentation has been updated, so be sure to read it. A new -style of nocounter has been added to suppress any incremented counter.

I have made some other changes in Builder.pm, working towards a 3.024 release, so I suggest that you get the new Builder.pm, and copy/paste just the pageLabel() routine (and POD) out of it into your existing 3.023's Builder.pm. Let me know how it turns out.

hisdeedsaredust commented 2 years ago

Almost tested as suggested – copying the whole of github head Builder.pm over my existing installation works just fine.

This works exactly as intended, thank you. From my earlier comment, the "it ought to be able to produce this":

 /PageLabels <<
 /Nums [
  0 << /St 1 >>
  2 << /P () /S /r /St 1 >>
 10 << /St 1 >>
 12 << /P () /S /D /St 1 >>
 42 << /St 1 >>
 44 << /P () /S /D /St 31 >>
140 << /St 1 >>
] >>

I can confirm that the page tree produced by 3.024 for my document is this:

/PageLabels <<
  /Nums [ 0 << /P () /St 1 >>
          2 << /P () /S /r /St 1 >>
         10 << /P () /St 1 >>
         12 << /P () /S /D /St 1 >>
         42 << /P () /St 1 >>
         44 << /P () /S /D /St 31 >>
        140 << /P () /St 1 >>
  ]
>>

which is functionally identical, when tested in Evince.

Thank you.

PhilterPaper commented 2 years ago

Are you explicitly specifying an empty -prefix string and -start 1? When I tested it, I wasn't getting the /P () and all the extra /St 1 entries. Not that they do any harm, but I'm curious.

$pdf->pageLabel(0, { -style => 'nocounter' });     # 1-2 blank      0 << >>
$pdf->pageLabel(2, { -style => 'roman' });         # 3-10  i-viii   2 << /S /r >>
$pdf->pageLabel(10, { -style => 'nocounter' });    # 11-12 blank    10 << >>
$pdf->pageLabel(12, { });                          # 13-42 1-30     12 << /S /D >>
$pdf->pageLabel(42, { -style => 'nocounter' });    # 43-44 blank    42 << >>
$pdf->pageLabel(44, { -start => 31 });             # 45-140 31-126  44 << /S /D /St 31 >>
$pdf->pageLabel(140, { -style => 'nocounter' });   # 141+ blank     140 << >>
hisdeedsaredust commented 2 years ago

Yes, my script always calls pageLabel() with -prefix,-start and -style, with a comment above that says

# Don't think it makes any sense to minimise items in this dictionary to changed only

:-)

(I see that the current state of the my script doesn't match the minimised example I provided in the opening comment.)

PhilterPaper commented 2 years ago

OK then, everything seems to be accounted for. This change will be in the next (3.024) release, which I hope to have out by the end of December. Closing this ticket.