boazsegev / combine_pdf

A Pure ruby library to merge PDF files, number pages and maybe more...
MIT License
733 stars 154 forks source link

Links drop when combining PDFs #200

Closed raphael-allard closed 2 years ago

raphael-allard commented 2 years ago

Hello Bo,

I'm working on an app where clients can upload PDF quotes. I'm trying to add a watermark on the top of each page of this pdf file. This watermark contains:

Here my service generate_pdf_header_with_online_signature_service.rb :

class GeneratePdfHeaderWithOnlineSignatureService
  def initialize(quote)
    @quote = quote
  end

  def call
    project = @quote.project

    html = ApplicationController.new.render_to_string(
      template: "advanced_admin/quotes/form/_partner_quote_signature.html.erb",
      layout: "pdf.html",
      locals: { quote: @quote, project: project, pdf: true }
    )

    pdf_options = {
      page_height: '1in',
      page_width: '10in',
      margin: { top: 5, bottom: 0, left: 10, right: 10 }
    }

    pdf = WickedPdf.new.pdf_from_string(html, pdf_options)
  end
end

And my view _partner_quote_signature.html.erb :

<div class="header__partner-quote">
  <% if !project.partner.nil? && !quote.is_estimation? %>
    <div class="signature-block__partner-quote">
      <% if (quote&.document&.attached? || pdf == true) && project.partner && !project.phone.blank? %>
          <%= link_to "Je valide mon devis en ligne", Rails.application.routes.url_helpers.start_client_project_signature_url(project.token(raw: true), quote_id: quote.id), method: :post, class: "signature-link__partner-quote" %>
          <p>ou au <a href="tel:+XXXXXXX" class="phone-link__partner-quote">XXXXXXX</a></p>
      <% else %>
        <p>Je valide mon devis au</p>
        <a href="tel:+XXXXXXX" class="phone-link__partner-quote">XXXXXXX</a>
      <% end %>
    </div>

    <%= image_tag wicked_pdf_asset_base64("mp-partner-quote-watermark.png") %>
  <% end %>
</div>

My service is called in another service pdf_watermark_service.rb :

class PdfWatermarkService
  include Rails.application.routes.url_helpers
  PDF_PAGE_A4_DIMENSIONS = [0, 0, 612.0, 792.0]

  attr_reader :attachment

  def initialize(attachment)
    @attachment = attachment
  end

  def call(&block)
    pdf = CombinePDF.new
    source = CombinePDF.parse(attachment.download, { allow_optional_content: true })
    @header_as_combine_pdf = CombinePDF.parse(GeneratePdfHeaderWithOnlineSignatureService.new(attachment.record).call)

    source.pages.each do |page|
      resize_page_to_fit_in_content(page)

      target_page = CombinePDF.create_page
      target_page << page
      target_page << header_image

      pdf << target_page
    end

    pdf.save(target_file_path)

    File.open(target_file_path, "r") do |file|
      block.call(file)
    end

    # Clean up the created temporary file
    File.delete(target_file_path)

    pdf
  end

  private

  # Load the image and resize it to fit a A4 page width

  def header_image
    @header_image ||= begin
      image = @header_as_combine_pdf.pages[0]

      target_image_width = PDF_PAGE_A4_DIMENSIONS[2]
      target_page_height = PDF_PAGE_A4_DIMENSIONS[3]

      source_image_width = image.page_size[2]
      source_image_height = image.page_size[3]

      ratio = target_image_width / source_image_width
      target_image_height = source_image_height * ratio

      # Resize the image and make it bound to the top of the page, as the Y axis' 0 is
      # at the bottom of the document in PDF files.
      image.resize([0, target_page_height - target_image_height, target_image_width, target_page_height])

      image
    end
  end

  def page_content_available_size
    @page_content_available_size ||= [
      PDF_PAGE_A4_DIMENSIONS[2],
      PDF_PAGE_A4_DIMENSIONS[3] - (header_image.page_size[3] - header_image.page_size[1])
    ]
  end

  def resize_page_to_fit_in_content(page)
    page_content_width, page_content_height = page_content_available_size
    source_page_width,  source_page_height  = page.page_size[2..3]

    # Those ratios are used to understand what is the aspect ratio of each container
    # The greater the number is, the wider the container is.

    page_content_ratio = page_content_width / page_content_height
    source_page_ratio  = source_page_width / source_page_height

    # If the source_page_ratio is greater than the page_content_ratio, we resize the
    # source page width to match the page content one, and adapt the height, that will
    # necessarily fit inside the target container.

    # If the source_page_ratio is smaller, we do the inverse.

    # So we define the ratio that we'll be using to reduce the sizes, to keep the original
    # aspect ratio and target the right size.

    target_ratio = if source_page_ratio > page_content_ratio
      (page_content_width / source_page_width)
    else
      (page_content_height / source_page_height)
    end

    # We build the target width and heights by applying the target ratio to make them fit
    # in the target bounds
    target_width  = (source_page_width  * target_ratio)
    target_height = (source_page_height * target_ratio)

    # We then define X and Y positions where we'll be inserting the source pdf to, so that
    # it ends up centered in the remaining page content.
    start_x = (page_content_width  - target_width)  / 2
    start_y = (page_content_height - target_height) / 2

    # Then we define the X and Y positions of the other side of the bounding box, as the
    # third and fourth arguments are coordinates and not sizes.
    end_x = start_x + target_width
    end_y = start_y + target_height

    # Finally apply the resize !
    page.resize([start_x, start_y, end_x, end_y])
  end

  def target_file_path
    @target_file_path ||= "/tmp/watermarked_pdf_#{SecureRandom.hex(10)}.pdf"
  end

  def attachement_url
    case Rails.application.config.active_storage.service
    when :local
      rails_blob_url(@attachment)
    else
      @attachment.service_url
    end
  end
end

resize_page_to_fit_in_content and header_image are methods to resize pages & header to good ratio. You need to know that header_image returns the following: @header_as_combine_pdf.pages[0]

To finish, when a client uploads a pdf quote, this one is committed and the following method is called:

  def process_source_quote_document
    return if !source_document.attached? || source_document.blob.metadata["watermarked"]

    PdfWatermarkService.new(source_document).call do |file|

      # Update blob to mark it as "watermarked"
      source_document.blob.tap do |blob|
        blob.metadata["watermarked"] = true
        blob.metadata_will_change!
        blob.save!
      end

      filename = [
        "Devis",
        partner.name.parameterize,
        Time.now.strftime("%d%m%Y%H%M")
      ].join("-")

      self.document.attach(
        filename: "#{filename}.pdf",
        content_type: "application/pdf",
        io: file
      )

      save!
    end
  end

The PDF is then generated and the watermark is correctly loaded on each quote pages. My issue here: the links contained in my view are disabled and not clickable. Any idea how to fix that?

Thanks a lot for your help šŸ™Œ

boazsegev commented 2 years ago

Hi, thank you for opening this issue.

Normally I would copy the example code to an application and try to run a solution before making a suggestion... however, I'm both short of time and resources, so I'll suggest an untested approach (possible solution and if it doesn't work I'll take the long route to solving this.

In general, links are a PDF resource (not a Page resource), so when you copy a page with a link to a new PDF, the link becomes a "dangling link" and is no longer valid (think of links like C pointers, once you copy the pointer address to a new process / file, it no longer points to a valid memory address / location).

However, if you copy the watermark into the PDF with the links, rather than creating a new PDF, the link might work out.

In PdfWatermarkService.call, instead of:

  def call(&block)
    pdf = CombinePDF.new
    source = CombinePDF.parse(attachment.download, { allow_optional_content: true })
    @header_as_combine_pdf = CombinePDF.parse(GeneratePdfHeaderWithOnlineSignatureService.new(attachment.record).call)

    source.pages.each do |page|
      resize_page_to_fit_in_content(page)

      target_page = CombinePDF.create_page
      target_page << page
      target_page << header_image

      pdf << target_page
    end

    pdf.save(target_file_path)

    File.open(target_file_path, "r") do |file|
      block.call(file)
    end

    # Clean up the created temporary file
    File.delete(target_file_path)

    pdf
  end

Maybe (untested, might need some debugging) try:

  def call(&block)
    pdf = CombinePDF.parse(attachment.download, { allow_optional_content: true })
    @header_as_combine_pdf = CombinePDF.parse(GeneratePdfHeaderWithOnlineSignatureService.new(attachment.record).call)
    watermark = header_image

    pdf.pages.each do |page|
      resize_page_to_fit_in_content(page)
      page << watermark
    end

    pdf.save(target_file_path)

    File.open(target_file_path, "r") do |file|
      block.call(file)
    end

    # Clean up the created temporary file
    File.delete(target_file_path)
    pdf
  end

Let me know if this helps.

Cheers, Bo.

raphael-allard commented 2 years ago

Hi Bo,

Thanks for your explanations and your prompt answer, it's really appreciated!

Unfortunately, your solution doesn't seem to work. A PDF file is well generated and doesn't raise any errors but the whole watermark on the top of each quote page is missing :/

I updated my comment above with my full service methods, maybe there is something in there that create the issue...

Again, thank you for your time šŸ™

boazsegev commented 2 years ago

Hi @raphael-allard ,

Am I right in understanding that now the issue is that the links work but the overlay doesn't?

As a side note:

I noticed that the header assumes A4 while the PDF page is of variable XY location (resize_page_to_fit_in_content) and might not start with 0 as the bottom left corner. Why not use page.resize([0, 0, 595, 842]) (added to my example)?

raphael-allard commented 2 years ago

Hi Bo,

Actually the link is not working, sorry for this misunderstanding: the pdf quote is generated with the watermark but the link stays unclickable :/ Do you think to something else?

About your second comment, are you meaning that instead of create that whole long method, I could just use page.resize([0, 0, 595, 842]) on the client quote PDF?

Thanks for your patience!

boazsegev commented 2 years ago

Hi @raphael-allard ,

Watermark:

I just realized that your links might be unclickable and not broken. Access to the links might be blocked by the watermark. Much like CSS, a transparent object / layer might prevent you from clicking a link under that layer. Try using underlay instead of an overlay... i.e.:

page >> watermark # instead of `page << watermark`

This really depends on the watermark (is it text? is it a transparent image covering the whole page?) and the PDF page (is the background transparent? will it allow an underlay to be seen?)...

I do not know what watermark you are using, but you could test the issue by temporarily using a text watermark that allows the mouse to access objects that the text does not cover. i.e.:

# untested pseudo code:
underlay = CombinePDF.create_page(page.page_size)
underlay.textbox "underlay", y:0, text_valign: :bottom, font_color: [0.75, 0.75, 0.75]
page >> underlay

overlay = CombinePDF.create_page(page.page_size)
overlay.textbox "overlay",  y:0, text_valign: :top, font_color: [0, 0, 0], opacity: 0.25
page << overlay

Resize:

I am not sure if page.resize will center the object horizontally or not (I think it should), so try it out and see... but yes, I think the page.resize(A4) (where A4 = [0, 0, 595, 842]) should work.

raphael-allard commented 2 years ago

Hi @boazsegev !

I'm sorry, I've just realised that I wasn't using the good word to describe what I'm trying to do... When I talk about "watermark", it doesn't refer to the filter we usually put on the whole image to prevent from copy. A picture is worth a thousand words :)

Screenshot 2021-09-20 at 21 16 32

Above is the PDF generated after the client uploaded it on our app. The whole yellow part is the client quote and the part arounded by a red square is what I call "watermark". I should have say "header" instead, sorry for my bad english... As you know, we combine this header to each page of the client quote to have the signature link on the top.

"Je valide mon devis en ligne" text should be clickable but nothing happens when I click on it and there is no "hand cursor" on the hover. All this red square is generated with html (you can have a look on my first message with the html.erb file).

The weird thing is when I generate this header as new pdf page and that I add it below the client quote, the link is clickable. But at the moment I combine it to each page of the quote, the link seems disappear. It's very confusing!

I tried your solution but unfortunately, the issue persists... Thanks again for your time, it's really appreciated!

raphael-allard commented 2 years ago

Few months later, I went back on that feature and I finally found a way to make it works!

When you combine two PDFs whose one of them has a link (ex: "click here to go on google" that it redirects to google.com), the related link is lost.

Instead of make the sentence clickable, I simply add the link permanently => "click on the following to go on google: google.com"

It was actually the easiest way to do!