gettalong / hexapdf

Versatile PDF creation and manipulation for Ruby
https://hexapdf.gettalong.org
Other
1.21k stars 69 forks source link

HexaPDF::Content::Processor undefined method `glyph_scaling_factor' #302

Closed NelsonDocsketch closed 3 months ago

NelsonDocsketch commented 3 months ago

Hi @gettalong i'm currently having the following error while using the content processor with an specific file, it looks like a font issue because when i tried to edit the file with adobe acrobat, it complained about the file encoding. It started working after saving it, can't share the original file.

require 'hexapdf'

path = 'output.pdf'

document = HexaPDF::Document.open(ARGV[0])

document.pages.each do |page|
  processor  = HexaPDF::Content::Processor.new(page)
  page.process_contents(processor)
end
 /hexapdf-0.40.0/lib/hexapdf/content/graphics_state.rb:732:in `update_scaled_font_size': undefined method `glyph_scaling_factor' for {:BaseFont=>:Helvetica, :Encoding=>:WinAnsiEncoding, :Name=>:F1, :Subtype=>:Type1, :Type=>:Font}:Hash (NoMethodError)

        @scaled_font_size = @font_size * (@font&.glyph_scaling_factor || 0.001) *
                                               ^^^^^^^^^^^^^^^^^^^^^^
    from  /hexapdf-0.40.0/lib/hexapdf/content/graphics_state.rb:685:in `font='
from  /hexapdf-0.40.0/lib/hexapdf/content/operator.rb:767:in `invoke'
    from  /hexapdf-0.40.0/lib/hexapdf/content/processor.rb:361:in `process'
    from  /hexapdf-0.40.0/lib/hexapdf/content/parser.rb:192:in `block in parse'
from  /hexapdf-0.40.0/lib/hexapdf/content/parser.rb:186:in `loop'
    from  /hexapdf-0.40.0/lib/hexapdf/content/parser.rb:186:in `parse'
    from  /hexapdf-0.40.0/lib/hexapdf/content/parser.rb:166:in `parse'
from  /hexapdf-0.40.0/lib/hexapdf/type/page.rb:393:in `process_contents'
    from hexapdf_error.rb:9:in `block in <main>'
    from  /hexapdf-0.40.0/lib/hexapdf/type/page_tree_node.rb:243:in `block in each_page'
from  /hexapdf-0.40.0/lib/hexapdf/pdf_array.rb:183:in `block in each'
    from  /hexapdf-0.40.0/lib/hexapdf/pdf_array.rb:183:in `each_index'
    from  /hexapdf-0.40.0/lib/hexapdf/pdf_array.rb:183:in `each'
from  /hexapdf-0.40.0/lib/hexapdf/type/page_tree_node.rb:241:in `each_page'
    from  /hexapdf-0.40.0/lib/hexapdf/document/pages.rb:178:in `each'
    from hexapdf_error.rb:7:in `<main>'

Thanks for your help on this.

gettalong commented 3 months ago

From the stack trace it seems that HexaPDF doesn't correctly map the font dictionary object to the correct class. Alas, it is hard to identify the problem without the file.

Could you run hexapdf info --check file.pdf and post the output?

NelsonDocsketch commented 3 months ago

Sure thing.

WARNING: Validation error for trailer: ID field should always be set (correctable)
WARNING: Validation error for sub-object of object type XObject (11,0): Field SMask requires document version to be 1.4 (correctable)
File name:          original.pdf
File size:          334802 bytes
Producer:           PyPDF3
Pages:              4
Version:            1.4
gettalong commented 3 months ago

Thanks for the output! Alas, it doesn't report anything out of the ordinary.

Could you please run hexapdf inspect original.pdf search Helvetica rev pages po 1 po 2 po 3 po 4 and post the output?

And if possible also hexapdf inspect original.pdf r?

Thanks!

NelsonDocsketch commented 3 months ago

Here is the output for hexapdf inspect original.pdf search Helvetica rev pages po 1 po 2 po 3 po 4

3 0 obj
<<
  /Contents 8 0 R
  /MediaBox [0 0 595.2756 841.8898 ]
  /Parent 1 0 R
  /Resources <<
    /Font <<
      /F1 9 0 R
      /F1f6dc6d8e-b94b-42b0-a7b0-8e25e29b399d <<
        /BaseFont /Helvetica
        /Encoding /WinAnsiEncoding
        /Name /F1
        /Subtype /Type1
        /Type /Font
      >>
      /F2 10 0 R
    >>
    /XObject <<
      /FormXob.e3151a7076f6a16b02ee8d8d23760ff3 11 0 R
      /FormXob.321bfabbad507095d83a7452f167c883 13 0 R
    >>
    /ProcSet [/PDF /ImageB /Text /ImageC /ImageI ]
  >>
  /Rotate 0
  /Trans <<
  >>
  /Type /Page
  /Annots []
>>
endobj
4 0 obj
<<
  /Contents 15 0 R
  /MediaBox [0 0 595.2756 841.8898 ]
  /Parent 1 0 R
  /Resources <<
    /Font <<
      /F1 16 0 R
      /F1ac088b9d-4d7b-41f8-a863-217a21fa70b8 <<
        /BaseFont /Helvetica
        /Encoding /WinAnsiEncoding
        /Name /F1
        /Subtype /Type1
        /Type /Font
      >>
      /F2 10 0 R
    >>
    /XObject <<
      /FormXob.e3151a7076f6a16b02ee8d8d23760ff3 17 0 R
      /FormXob.321bfabbad507095d83a7452f167c883 13 0 R
    >>
    /ProcSet [/PDF /ImageB /Text /ImageC /ImageI ]
  >>
  /Rotate 0
  /Trans <<
  >>
  /Type /Page
  /Annots []
>>
endobj
5 0 obj
<<
  /Contents 19 0 R
  /MediaBox [0 0 595.2756 841.8898 ]
  /Parent 1 0 R
  /Resources <<
    /Font <<
      /F1 20 0 R
      /F1a2604e27-a3c0-481c-8885-d6b87a515ac6 <<
        /BaseFont /Helvetica
        /Encoding /WinAnsiEncoding
        /Name /F1
        /Subtype /Type1
        /Type /Font
      >>
      /F2 10 0 R
    >>
    /XObject <<
      /FormXob.e3151a7076f6a16b02ee8d8d23760ff3 21 0 R
      /FormXob.321bfabbad507095d83a7452f167c883 13 0 R
    >>
    /ProcSet [/PDF /ImageB /Text /ImageC /ImageI ]
  >>
  /Rotate 0
  /Trans <<
  >>
  /Type /Page
  /Annots []
>>
endobj
6 0 obj
<<
  /Contents 23 0 R
  /MediaBox [0 0 595.2756 841.8898 ]
  /Parent 1 0 R
  /Resources <<
    /Font <<
      /F1 24 0 R
      /F190ff26bf-c0b4-48e4-8cff-84860de731f3 <<
        /BaseFont /Helvetica
        /Encoding /WinAnsiEncoding
        /Name /F1
        /Subtype /Type1
        /Type /Font
      >>
      /F2 10 0 R
    >>
    /XObject <<
      /FormXob.e3151a7076f6a16b02ee8d8d23760ff3 25 0 R
      /FormXob.321bfabbad507095d83a7452f167c883 13 0 R
    >>
    /ProcSet [/PDF /ImageB /Text /ImageC /ImageI ]
  >>
  /Rotate 0
  /Trans <<
  >>
  /Type /Page
  /Annots []
>>
endobj
9 0 obj
<<
  /BaseFont /Helvetica
  /Encoding /WinAnsiEncoding
  /Name /F1
  /Subtype /Type1
  /Type /Font
>>
endobj
10 0 obj
<<
  /BaseFont /Helvetica-Bold
  /Encoding /WinAnsiEncoding
  /Name /F2
  /Subtype /Type1
  /Type /Font
>>
endobj
16 0 obj
<<
  /BaseFont /Helvetica
  /Encoding /WinAnsiEncoding
  /Name /F1
  /Subtype /Type1
  /Type /Font
>>
endobj
20 0 obj
<<
  /BaseFont /Helvetica
  /Encoding /WinAnsiEncoding
  /Name /F1
  /Subtype /Type1
  /Type /Font
>>
endobj
24 0 obj
<<
  /BaseFont /Helvetica
  /Encoding /WinAnsiEncoding
  /Name /F1
  /Subtype /Type1
  /Type /Font
>>
endobj
Error: Invalid revision numer specified
3 0 obj
<<
  /Contents 8 0 R
  /MediaBox [0 0 595.2756 841.8898 ]
  /Parent 1 0 R
  /Resources <<
    /Font <<
      /F1 9 0 R
      /F1f6dc6d8e-b94b-42b0-a7b0-8e25e29b399d <<
        /BaseFont /Helvetica
        /Encoding /WinAnsiEncoding
        /Name /F1
        /Subtype /Type1
        /Type /Font
      >>
      /F2 10 0 R
    >>
    /XObject <<
      /FormXob.e3151a7076f6a16b02ee8d8d23760ff3 11 0 R
      /FormXob.321bfabbad507095d83a7452f167c883 13 0 R
    >>
    /ProcSet [/PDF /ImageB /Text /ImageC /ImageI ]
  >>
  /Rotate 0
  /Trans <<
  >>
  /Type /Page
  /Annots []
>>
endobj
4 0 obj
<<
  /Contents 15 0 R
  /MediaBox [0 0 595.2756 841.8898 ]
  /Parent 1 0 R
  /Resources <<
    /Font <<
      /F1 16 0 R
      /F1ac088b9d-4d7b-41f8-a863-217a21fa70b8 <<
        /BaseFont /Helvetica
        /Encoding /WinAnsiEncoding
        /Name /F1
        /Subtype /Type1
        /Type /Font
      >>
      /F2 10 0 R
    >>
    /XObject <<
      /FormXob.e3151a7076f6a16b02ee8d8d23760ff3 17 0 R
      /FormXob.321bfabbad507095d83a7452f167c883 13 0 R
    >>
    /ProcSet [/PDF /ImageB /Text /ImageC /ImageI ]
  >>
  /Rotate 0
  /Trans <<
  >>
  /Type /Page
  /Annots []
>>
endobj
5 0 obj
<<
  /Contents 19 0 R
  /MediaBox [0 0 595.2756 841.8898 ]
  /Parent 1 0 R
  /Resources <<
    /Font <<
      /F1 20 0 R
      /F1a2604e27-a3c0-481c-8885-d6b87a515ac6 <<
        /BaseFont /Helvetica
        /Encoding /WinAnsiEncoding
        /Name /F1
        /Subtype /Type1
        /Type /Font
      >>
      /F2 10 0 R
    >>
    /XObject <<
      /FormXob.e3151a7076f6a16b02ee8d8d23760ff3 21 0 R
      /FormXob.321bfabbad507095d83a7452f167c883 13 0 R
    >>
    /ProcSet [/PDF /ImageB /Text /ImageC /ImageI ]
  >>
  /Rotate 0
  /Trans <<
  >>
  /Type /Page
  /Annots []
>>
endobj
6 0 obj
<<
  /Contents 23 0 R
  /MediaBox [0 0 595.2756 841.8898 ]
  /Parent 1 0 R
  /Resources <<
    /Font <<
      /F1 24 0 R
      /F190ff26bf-c0b4-48e4-8cff-84860de731f3 <<
        /BaseFont /Helvetica
        /Encoding /WinAnsiEncoding
        /Name /F1
        /Subtype /Type1
        /Type /Font
      >>
      /F2 10 0 R
    >>
    /XObject <<
      /FormXob.e3151a7076f6a16b02ee8d8d23760ff3 25 0 R
      /FormXob.321bfabbad507095d83a7452f167c883 13 0 R
    >>
    /ProcSet [/PDF /ImageB /Text /ImageC /ImageI ]
  >>
  /Rotate 0
  /Trans <<
  >>
  /Type /Page
  /Annots []
>>
endobj

And this is for hexapdf inspect original.pdf r

<<
  /Info {obj 1} <<
    /Producer (PyPDF3)
  >>
  /Root {obj 2} <<
    /Pages {obj 3} <<
      /Count 4
      /Kids [{obj page 1} <<
        /Annots []
        /Contents {obj 5} <<
          /Length 12114
        >>
        /MediaBox [0 0 595.2756 841.8898 ]
        /Parent {ref 3}
        /Resources <<
          /Font <<
            /F1 {obj 6} <<
              /BaseFont /Helvetica
              /Encoding /WinAnsiEncoding
              /Name /F1
              /Subtype /Type1
              /Type /Font
            >>
            /F1f6dc6d8e-b94b-42b0-a7b0-8e25e29b399d <<
              /BaseFont /Helvetica
              /Encoding /WinAnsiEncoding
              /Name /F1
              /Subtype /Type1
              /Type /Font
            >>
            /F2 {obj 7} <<
              /BaseFont /Helvetica-Bold
              /Encoding /WinAnsiEncoding
              /Name /F2
              /Subtype /Type1
              /Type /Font
            >>
          >>
          /ProcSet [/PDF /ImageB /Text /ImageC /ImageI ]
          /XObject <<
            /FormXob.321bfabbad507095d83a7452f167c883 {obj 8} <<
              /BitsPerComponent 8
              /ColorSpace /DeviceRGB
              /Filter [/ASCII85Decode /FlateDecode ]
              /Height 780
              /Length 31944
              /SMask {obj 9} <<
                /BitsPerComponent 8
                /ColorSpace /DeviceGray
                /Decode [0 1 ]
                /Filter [/ASCII85Decode /FlateDecode ]
                /Height 780
                /Length 23810
                /Subtype /Image
                /Type /XObject
                /Width 1682
              >>
              /Subtype /Image
              /Type /XObject
              /Width 1682
            >>
            /FormXob.e3151a7076f6a16b02ee8d8d23760ff3 {obj 10} <<
              /BitsPerComponent 8
              /ColorSpace /DeviceRGB
              /Filter [/ASCII85Decode /FlateDecode ]
              /Height 1056
              /Length 36681
              /SMask {obj 11} <<
                /BitsPerComponent 8
                /ColorSpace /DeviceGray
                /Decode [0 1 ]
                /Filter [/ASCII85Decode /FlateDecode ]
                /Height 1056
                /Length 19721
                /Subtype /Image
                /Type /XObject
                /Width 793
              >>
              /Subtype /Image
              /Type /XObject
              /Width 793
            >>
          >>
        >>
        /Rotate 0
        /Trans <<
        >>
        /Type /Page
      >> {obj page 2} <<
        /Annots []
        /Contents {obj 13} <<
          /Length 10030
        >>
        /MediaBox [0 0 595.2756 841.8898 ]
        /Parent {ref 3}
        /Resources <<
          /Font <<
            /F1 {obj 14} <<
              /BaseFont /Helvetica
              /Encoding /WinAnsiEncoding
              /Name /F1
              /Subtype /Type1
              /Type /Font
            >>
            /F1ac088b9d-4d7b-41f8-a863-217a21fa70b8 <<
              /BaseFont /Helvetica
              /Encoding /WinAnsiEncoding
              /Name /F1
              /Subtype /Type1
              /Type /Font
            >>
            /F2 {ref 7}
          >>
          /ProcSet [/PDF /ImageB /Text /ImageC /ImageI ]
          /XObject <<
            /FormXob.321bfabbad507095d83a7452f167c883 {ref 8}
            /FormXob.e3151a7076f6a16b02ee8d8d23760ff3 {obj 15} <<
              /BitsPerComponent 8
              /ColorSpace /DeviceRGB
              /Filter [/ASCII85Decode /FlateDecode ]
              /Height 1056
              /Length 36681
              /SMask {obj 16} <<
                /BitsPerComponent 8
                /ColorSpace /DeviceGray
                /Decode [0 1 ]
                /Filter [/ASCII85Decode /FlateDecode ]
                /Height 1056
                /Length 19721
                /Subtype /Image
                /Type /XObject
                /Width 793
              >>
              /Subtype /Image
              /Type /XObject
              /Width 793
            >>
          >>
        >>
        /Rotate 0
        /Trans <<
        >>
        /Type /Page
      >> {obj page 3} <<
        /Annots []
        /Contents {obj 18} <<
          /Length 10972
        >>
        /MediaBox [0 0 595.2756 841.8898 ]
        /Parent {ref 3}
        /Resources <<
          /Font <<
            /F1 {obj 19} <<
              /BaseFont /Helvetica
              /Encoding /WinAnsiEncoding
              /Name /F1
              /Subtype /Type1
              /Type /Font
            >>
            /F1a2604e27-a3c0-481c-8885-d6b87a515ac6 <<
              /BaseFont /Helvetica
              /Encoding /WinAnsiEncoding
              /Name /F1
              /Subtype /Type1
              /Type /Font
            >>
            /F2 {ref 7}
          >>
          /ProcSet [/PDF /ImageB /Text /ImageC /ImageI ]
          /XObject <<
            /FormXob.321bfabbad507095d83a7452f167c883 {ref 8}
            /FormXob.e3151a7076f6a16b02ee8d8d23760ff3 {obj 20} <<
              /BitsPerComponent 8
              /ColorSpace /DeviceRGB
              /Filter [/ASCII85Decode /FlateDecode ]
              /Height 1056
              /Length 36681
              /SMask {obj 21} <<
                /BitsPerComponent 8
                /ColorSpace /DeviceGray
                /Decode [0 1 ]
                /Filter [/ASCII85Decode /FlateDecode ]
                /Height 1056
                /Length 19721
                /Subtype /Image
                /Type /XObject
                /Width 793
              >>
              /Subtype /Image
              /Type /XObject
              /Width 793
            >>
          >>
        >>
        /Rotate 0
        /Trans <<
        >>
        /Type /Page
      >> {obj page 4} <<
        /Annots []
        /Contents {obj 23} <<
          /Length 14775
        >>
        /MediaBox [0 0 595.2756 841.8898 ]
        /Parent {ref 3}
        /Resources <<
          /Font <<
            /F1 {obj 24} <<
              /BaseFont /Helvetica
              /Encoding /WinAnsiEncoding
              /Name /F1
              /Subtype /Type1
              /Type /Font
            >>
            /F190ff26bf-c0b4-48e4-8cff-84860de731f3 <<
              /BaseFont /Helvetica
              /Encoding /WinAnsiEncoding
              /Name /F1
              /Subtype /Type1
              /Type /Font
            >>
            /F2 {ref 7}
          >>
          /ProcSet [/PDF /ImageB /Text /ImageC /ImageI ]
          /XObject <<
            /FormXob.321bfabbad507095d83a7452f167c883 {ref 8}
            /FormXob.e3151a7076f6a16b02ee8d8d23760ff3 {obj 25} <<
              /BitsPerComponent 8
              /ColorSpace /DeviceRGB
              /Filter [/ASCII85Decode /FlateDecode ]
              /Height 1056
              /Length 36681
              /SMask {obj 26} <<
                /BitsPerComponent 8
                /ColorSpace /DeviceGray
                /Decode [0 1 ]
                /Filter [/ASCII85Decode /FlateDecode ]
                /Height 1056
                /Length 19721
                /Subtype /Image
                /Type /XObject
                /Width 793
              >>
              /Subtype /Image
              /Type /XObject
              /Width 793
            >>
          >>
        >>
        /Rotate 0
        /Trans <<
        >>
        /Type /Page
      >> ]
      /Type /Pages
    >>
    /Type /Catalog
  >>
  /Size 27
>>
gettalong commented 3 months ago

@NelsonDocsketch Thanks! I think I found the problem. Could you please try the devel branch with your PDF file?

NelsonDocsketch commented 3 months ago

@gettalong Sorry for taking too long, i just tried and that worked!

gettalong commented 3 months ago

No problem and thanks for testing! I will release a new version with the fix later this week.