christian-vigh-phpclasses / PdfToText

Extracts text from PDF files
Other
124 stars 92 forks source link

Diferences vs. Version 1.4.17 and 1.6.7 #20

Open fernandosanmar opened 6 years ago

fernandosanmar commented 6 years ago

Hello,

First of all, congratulations because you've created a great tool!

We have the Version 1.4.17 runing well in our system, but now we wanted to update to the lastest version and it is not running as expecting.

Most of the lines from the PDF are missing after read it.

Is there any different way to call the function or any different settings between these two versions?

Thank you in advance Fernando

PregLizZz commented 6 years ago

Hello, it seems like the problem is the Experimental implementation of CID fonts in the PdfTexterFontTable class:

// Experimental implementation of CID fonts
else if  ( preg_match ( '#/(Base)?Encoding \s* /Identity-H#ix', $font_definition ) ){
    if  ( preg_match ( '#/BaseFont \s* /(?P<font> [^\s/]+)#ix', $font_definition, $match ) )
        $font_variant   =  $match [ 'font' ] ;

    $font_type  =  PdfTexterFont::FONT_ENCODING_CID_IDENTITY_H ;
  }

Commenting that part out fixes our problem :) Best regards Daniel