PhilterPaper / Perl-PDF-Builder

Extended version of the popular PDF::API2 Perl-based PDF library for creating, reading, and modifying PDF documents
https://www.catskilltech.com/FreeSW/product/PDF%2DBuilder/title/PDF%3A%3ABuilder/freeSW_full
Other
6 stars 7 forks source link

Page boundaries should be adjusted after content is "un-rotated" on open_page() #218

Open PhilterPaper opened 4 months ago

PhilterPaper commented 4 months ago

In ssimms/pdfapi2/issues/84, @vadim-160102 reported a problem:

use strict;
use warnings;
use feature 'say';
use PDF::API2;

my $pdf = PDF::API2-> new;

my $p = $pdf-> page;
$p-> boundaries( 
    media => [ 0, 0, 600, 400 ], 
    crop  => [ 50, 50, 600, 400 ],
);
$p-> rotation( 180 );

my $f = $pdf-> font( 'Times-Roman' );
$p-> text
  -> font( $f, 96 )
  -> position( 100, 300 )
  -> text( 'Hello, PDF!' );

$pdf = PDF::API2-> from_string( $pdf-> to_string );

$pdf-> open_page( 1 );          # no-op? nope   (*)

$pdf-> save( 'visual_test.pdf' );

Initial PDF, serialized to string, simulates a file from external source. Boxes (e.g. CropBox here) is non-symmetrical relative to MediaBox, and page was rotated. The content is just chosen to be clearly seen if deformed/cropped. Comment/uncomment the star (*) marked line and inspect the output. This affects opening files and then adding content to a page, as well as importing a page to another document. Gaps between (Crop|Trim|Bleed|Art)Box and MediaBox should stay the same on top/left/bottom/right edges of un-rotated page, as they were on rotated page. If I get it right. Looks like this bug is primordial i.e. 20+ y.o. I'll try to supply a fix unless someone comes with better idea.

Edit: Actually, MediaBox on its own is enough to observe a bug, e.g. replace in the above with:

my $p = $pdf-> page;
$p-> boundaries( 
    media => [ 50, 50, 600, 400 ], 
);
$p-> rotation( 90 );

The "un-rotating" code in PDF/API2.pm does at least something to boxes boundaries in case of 90 and 270 degrees, and nothing for 180. Here we see that the bug is not because nothing was done for 180, but the whole logic appears to be simplistic and flawed.

==================================================== I have not yet had a chance to run this or examine the problem, to see if it's also in PDF::Builder. I did rewrite the *Box code to a large extent, so it's possible that this is a non-problem in PDF::Builder.

I vaguely recall a recent mention by someone (Vadim?) about problems with *Box calls... I'll have to dig up that ticket and see if the two are related.

PhilterPaper commented 3 months ago

Post by vadim-160102:

I'll try to supply a fix

The fragment from this line: (

[pdfapi2/lib/PDF/API2.pm](https://github.com/ssimms/pdfapi2/blob/32af5b86539b92b8e4333168b6ce05244613e72e/lib/PDF/API2.pm#L1275)

Line 1275 in [32af5b8](https://github.com/ssimms/pdfapi2/commit/32af5b86539b92b8e4333168b6ce05244613e72e)
 `if (($rotate = $page->find_prop('Rotate')) and not $page->{' opened'}) { `

) and till line 1309 is to be replaced with:

        sub _transform_box {
            my ( $tmatrix, @box ) = @_;
            my ( $a, $b, $c, $d, $e, $f ) = @$tmatrix;

            @box = ( $a * $box[ 0 ] + $c * $box[ 1 ] + $e,
                     $b * $box[ 0 ] + $d * $box[ 1 ] + $f,
                     $a * $box[ 2 ] + $c * $box[ 3 ] + $e,
                     $b * $box[ 2 ] + $d * $box[ 3 ] + $f );

            @box[ 0, 2 ] = @box[ 2, 0 ] if $box[ 2 ] < $box[ 0 ];
            @box[ 1, 3 ] = @box[ 3, 1 ] if $box[ 3 ] < $box[ 1 ];

            return @box
        }

        my $trans = '';

        if ( not $page-> {' opened' } and not $self-> default( 'nounrotate' )) {
            my $rotate = $page-> find_prop( 'Rotate' );
            $rotate = $rotate ? $rotate-> val % 360 : 0;
            $page-> { 'Rotate' } = PDFNum( 0 ) if $rotate;

            my $mb = $page-> find_prop( 'MediaBox' );
            my @mb = $mb ? map { $_-> val } $mb-> elements
                         : ( 0, 0, 612, 792 );

            if ( $rotate or $mb[ 0 ] or $mb[ 1 ]) {
                my $tmatrix = 
                    $rotate ==   0 ? [  1,  0,  0,  1, -$mb[ 0 ], -$mb[ 1 ]] :
                    $rotate ==  90 ? [  0, -1,  1,  0, -$mb[ 1 ],  $mb[ 2 ]] :
                    $rotate == 180 ? [ -1,  0,  0, -1,  $mb[ 2 ],  $mb[ 3 ]] :
                                     [  0,  1, -1,  0,  $mb[ 3 ], -$mb[ 0 ]];

                $page-> { 'MediaBox' } = PDFArray( map PDFNum( $_ ), 
                    _transform_box( $tmatrix, @mb ));

                for my $mediatype ( qw( CropBox BleedBox TrimBox ArtBox )) {
                    next unless my $media = $page-> find_prop( $mediatype );
                    my @box = map { $_-> val } $media-> elements;
                    $page-> { $mediatype } = PDFArray( map PDFNum( $_ ), 
                        _transform_box( $tmatrix, @box ))
                }
                $trans = join ' ', map( PDF::API2::Util::float( $_ ), @$tmatrix ), 'cm'
            }
        }

(And the helper _transform_box() subroutine definition should better (of course) be moved out of the enclosing open_page() body). Though the use of $a, $b is discouraged, I think it's OK for this small scope and helps to easier match with description and formulae in "The PDF Reference". Declaring $trans with my assumes line 1255 at the top of the sub only contains $page declaration, others aren't really required. For further hygiene, line 1314 could use postcondition ( "... if $trans;"), though improvements/cleaning in general are gargantuan task, and strict focus is only on the bug at hand. The fix above enforces lower left of MediaBox to become 0, 0, including the case of 0 degrees rotation. So then 'nounrotate' becomes misnomer, but it's probably OK because it's no longer publicly documented. I don't know of producers which massively create MB's with other than 0, 0 origin. This (enforced by PDF::API2 nowadays) "unrotation" breaks, in general, files with annotations anyway (the least which comes to mind, and which you are aware perhaps). I wouldn't like to increase this share of broken PDF files with my making of "unrotation" even more intrusive. + Sorry I didn't follow your formatting style preference.