gettalong / hexapdf

Versatile PDF creation and manipulation for Ruby
https://hexapdf.gettalong.org
Other
1.21k stars 69 forks source link

`undefined method 'glyph_scaling_factor'` #247

Closed casperbrike closed 1 year ago

casperbrike commented 1 year ago

Greetings!

I'm trying to read text from PDF with HexaPDF (v0.32.0) using the following code:

class TextProcessor < HexaPDF::Content::Processor
  attr_reader :text

  def initialize(resources = nil)
    super
    @text = ''
  end

  def show_text(str)
    @text += decode_text(str)
  end
  alias show_text_with_positioning show_text
end

processor = TextProcessor.new
doc = HexaPDF::Document.open('doc.pdf')
doc.pages.each { |page| page.process_contents(processor) }

puts processor.text

It works great for most of the cases but sometimes it gives undefined method 'glyph_scaling_factor' error (attached PDF as an example).

Do you have any suggestions on how we can resolve the issue?

gettalong commented 1 year ago

Thanks for the report and the PDF! I can reproduce the problem.

gettalong commented 1 year ago

The problem is that the PDF is not following the PDF standard in that a font data structure is missing a required key. This leads to the problem that the automatic mapping of data structure to HexaPDF class cannot be done. And that leads to the error.

I will see how this problem can be worked around.

casperbrike commented 1 year ago

Great, thank you very much!

gettalong commented 1 year ago

@casperbrike Sorry for the long wait! The next version of HexaPDF will have a fix for this so that the automatic type mapping works correctly in such problematic cases.

casperbrike commented 1 year ago

@casperbrike Sorry for the long wait! The next version of HexaPDF will have a fix for this so that the automatic type mapping works correctly in such problematic cases.

Thank you very much!