`optimize: true` creates invalid pdfs for macOS Ventura with hexapdf > 0.23.0

enthusiasmus commented 1 year ago

When using the optimize: true configuration any hexapdf version greater 0.23.0 generates for us an pdf which is openable on Linux and Windows with different PDF Readers, but not on macOS.

This PDF is not openable on macOS: Gutschein_HexaPDF_0.23.0_with_optimize.pdf

This PDF is openable on macOS: Gutschein_HexaPDF_0.23.0_without_optimize.pdf

The only difference is that we changed from document.write(pdf, optimize: true) to document.write(pdf) and then the PDF is openable.

OS: macOS Ventura 13.0.1 HexaPDF: >= 0.23.0 Ruby: 2.7.2 Error-Message on opening:

Do you have any suggestions or ideas to solve this problem? Would be awesome, if we still can use the optimize configuration, because its results in 20% smaller file size.

Best regards, Lukas

gettalong commented 1 year ago

First I thought this has something to do with the root object, the catalog dictionary, being stored in an object stream. However, looking more closely at the file, it has a very strange structure because it has two revisions where the first revisions refers to objects outside the range of available objects in that revision, ie. it contains only 7 objects but refers to an object with number 10.

Could you share the original document from which the two linked documents were created and the exact code you used?

gettalong commented 1 year ago

Could you also try the code in the devel branch to see if it produces an invalid PDF? I have changed the implementation to never store the catalog object in an object stream. This might or might not fix it.

enthusiasmus commented 1 year ago

Interesting and thank for your help!

The following pdf is our template: voucher_template.pdf

This one is created with the gem from the "devel" branch: Gutschein_HexaPDF_devel.pdf As the other one, we also can't open this on macOS.

Here is the code we use to create our pdfs with Pdf::Voucher.new(city: City.first, code: Code.first).generate:

class Pdf
  class Voucher
    include AbstractController::Translation
    include TranslationHelper

    PDF_WIDTH = 595
    MARGIN = 50
    MAX_MISSING_GLYPH_RETRIES = 10
    FONT_EMOJIS = TTFunk::File.open("app/lib/pdf/NotoSans-WithEmojis.ttf")

    delegate :path, to: :pdf
    attr_reader :pdf, :document, :qr_code, :canvas, :font_text,
                :font_bold, :font_emojis, :font_emojis_bold, :code

    def initialize(city:, code:, template_path: nil)
      @pdf = Tempfile.new
      @qr_code = Tempfile.new

      @code = code

      template_path ||= template_path_city_zone(city)
      @document = HexaPDF::Document.open(template_path, config: config)
      @canvas = document.pages[0].canvas(type: :overlay)

      load_fonts
    end

    private def config
      HexaPDF::Configuration.with_defaults(
        "font.map" => {
          "NotoSansWithEmojis" => {
            none: Rails.root.join("app/lib/pdf/NotoSans-WithEmojis.ttf"),
            bold: Rails.root.join("app/lib/pdf/NotoSans-SemiBold-WithEmojis.ttf")
          }
        }
      )
    end

    def load_fonts
      @font_text = document.fonts.add("Helvetica")
      @font_bold = document.fonts.add("Helvetica", variant: :bold)
      @font_emojis = document.fonts.add("NotoSansWithEmojis")
      @font_emojis_bold = document.fonts.add("NotoSansWithEmojis", variant: :bold)
    end

    def generate(text_from_customer: nil, url: nil)
      @url = url
      add_qr_code
      add_amount
      add_code
      add_sponsor_amount
      add_text_from_customer(text_from_customer)
      add_redeem_notes
      add_terms

      write
    end

    private def text_layouter
      @text_layouter ||= base_layouter.tap { |l| l.style.align(:center).valign(:center) }
    end

    private def base_layouter
      HexaPDF::Layout::TextLayouter.new(font: font_text)
    end

    private def add_amount
      add(
        text: Spree::Money.new(code.available_credit_object.amount).to_s,
        y_coordinate: 535, height: 32, font: { family: font_bold }
      )
    end

    private def add_code
      add(text: Code.code_in_blocks(code.code), y_coordinate: 459, height: 32)
    end

    private def add_text_from_customer(text)
      return unless text

      add(
        text: remove_variation_selectors(text), y_coordinate: 402, height: 148,
        font: { size: 14, color: [64, 64, 64], family: font_emojis }
      )
    end

    private def remove_variation_selectors(text)
      # see https://unicode.org/Public/emoji//11.0/emoji-data.txt
      # FE0F          ; Emoji_Component      #  3.2  [1] VARIATION SELECTOR-16
      # 1F3FB..1F3FF  ; Emoji_Component      #  8.0  [5] light skin tone..dark skin tone
      # 1F9B0..1F9B3  ; Emoji_Component      # 11.0  [4] red-haired..white-haired
      text.tr("\uFE0F🏻🏼🏽🏾🏿🦰🦱🦲🦳", "")
    end

    private def add_sponsor_amount
      amount = code.available_credit_object.amount
      full_amount = code.available_credit_object.full_amount
      return if full_amount == amount

      redeem_to = code.available_credit_object.redeem_to
      i18n_key_postfix = "_with_redeem_to" if redeem_to
      add(text: t("lib.pdf.voucher.sponsor_amount#{i18n_key_postfix}",
                  voucher_amount: Spree::Money.new(full_amount),
                  date: (I18n.l(redeem_to, format: :date_standard) if redeem_to)),
          y_coordinate: 420, height: 32, font: { size: 10, color: [128, 128, 128] })
    end

    private def add_terms
      add(
        text: terms,
        y_coordinate: 150, height: 100, font: { size: 10, color: [128, 128, 128] },
        layouter: base_layouter.tap { |l| l.style.align(:center).valign(:bottom) }
      )
    end

    private def add(text:, y_coordinate:, height:, font: {}, layouter: text_layouter)
      layouter.fit(
        [text_fragment(text: strip_unknown_characters(text), font: font)],
        PDF_WIDTH - (MARGIN * 2), height
      ).draw(canvas, MARGIN, y_coordinate)
    end

    private def strip_unknown_characters(text)
      text.chars.each do |character|
        next if FONT_EMOJIS.cmap.unicode.first[character.unpack1("U*")].nonzero?

        @glyph_error = text.delete!(character)
      end
      text
    end

    private def text_fragment(text:, font:)
      HexaPDF::Layout::TextFragment.create(
        text.to_s,
        font: font.fetch(:family, font_text), font_size: font.fetch(:size, 32),
        fill_color: font.fetch(:color, [0, 0, 0])
      )
    end

    private def add_qr_code
      RQRCode::QRCode.new(code.code).as_png(file: qr_code)
      qr_code.rewind

      canvas.image(qr_code, at: [427, 667], width: 150, height: 150)
    end

    private def add_redeem_notes
      add(
        text: redeem_text, y_coordinate: 250, height: 32, font: { size: 16, color: [64, 64, 64] }
      )
      add(
        text: remove_variation_selectors(redeem_subtext), y_coordinate: 230, height: 32,
        font: { size: 16, color: [64, 64, 64], family: font_emojis_bold }
      )
    end

    private def redeem_text
      tc("lib.pdf.voucher.redeem")
    end

    private def redeem_subtext
      @url || current_city.pdf_voucher_url ||
      prefix_canonical_url("", city: current_city)
    end

    private def write
      document.write(pdf, optimize: true)

      pdf.rewind
      pdf.read
    end

    private def terms
      tc(
        "lib.pdf.voucher.terms",
        starts_at: I18n.l(code.starts_at, format: "%d.%m.%Y"),
        expires_at: I18n.l(code.expires_at, format: "%d.%m.%Y")
      )
    end

    private def template_path_city(city)
      Rails.root.join("app/lib/pdf/voucher_template.pdf")
    end
  end
end

Looking forward to further thoughts of you! Thank you!

gettalong commented 1 year ago

@enthusiasmus Thanks - I will have a more in-depth look later!

One suggestion: You might wanna use the hexapdf-extras gem for the QRCode, see https://hexapdf-extras.gettalong.org/api/HexaPDF/Extras/GraphicObject/QRCode.html. This would avoid creating a temporary file and also embed the QRCode using vector graphics instead of PNG bitmap graphics.

Another thing: You are using TTFunk for detecting glyphs in #strip_unknown_characters. When HexaPDF encounters a character that it cannot map to a glyph, the default implementation adds a HexaPDF::Font::InvalidGlyph instead of a normal glyph. Those need to be removed before outputting text. Alternatively, you can use the configuration option font.on_missing_glyph to specify how missing glyphs should be handled.

gettalong commented 1 year ago

@enthusiasmus I can reproduce the problem, found the reason and have a quick fix.

The reason for the curious structure mentioned in https://github.com/gettalong/hexapdf/issues/216#issuecomment-1336119524 is that the original file is linearized. This leads to HexaPDF finding two revisions for the document. And when optimizing, the second revision uses cross-reference streams and the first doesn't - this is the problem as e.g. Acrobat Reader can't deal with this (although technically it is no problem). This problem is actually a regression due to some other changes regarding revisions in the past few months.

So what I will do is fix that regression, then the output will work. Furthermore, I will make HexaPDF recognize linearized PDF files and make it create only one combined revision which should alleviate other potential problems.

enthusiasmus commented 1 year ago

Hi @gettalong! Thank you for your help, the hints and the good news! Just awesome! Looking forward to your quick fix!

gettalong commented 1 year ago

@enthusiasmus Both changes are now live in the devel branch, please let me know if that works for your use-cases! In my tests the created files are accepted by Adobe Reader where they were not before. The next release will probably by the end of the week.

enthusiasmus commented 1 year ago

Dear @gettalong, the current devel branch works for us! Thanks a lot! Looking forward to the coming release. All the best to you!

gettalong / hexapdf

`optimize: true` creates invalid pdfs for macOS Ventura with hexapdf > 0.23.0 #216