boazsegev / combine_pdf

A Pure ruby library to merge PDF files, number pages and maybe more...
MIT License
734 stars 156 forks source link

lack of content causing combine pdf not to merge #79

Closed fredmcgroarty closed 8 years ago

fredmcgroarty commented 8 years ago

I am building a pdf from two sources, the first is an asset hosted on s3 which I download and merge. The other source is from two Prawn templates, which I first render to string, write and push into to the combined pdf instance.

It appears the first file, which I am rendering from a Prawn template, will only display if there is a certain amount of content present. As it is a covering letter content it sparse. I now compensate for the lack of content by doing something like this in the prawn template.

  balls = []
  6.times do
    # next level s**t going on here!
    balls << ['', '                            .', ' . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .' '                            ', '.                  ', ""]
  end
  pdf.table(balls,  :column_widths => [50, 175, 75, 50, 70, 100]) do
    cells.borders = []
  end

I debugged this error by dropping the content of a prawn template that was combining correctly (the order of compilation doesn't seem to be an issue) and slowly removing content bit by bit till the template stopped rendering in the combined PDF.

Here's a snippet from the PORO that is handling the combinator.

  def clear_directory!
    return false unless current_path
    FileUtils.rm_rf(current_path)
  end

  def initialize(job, asset)
    @asset = asset
    @combined = CombinePDF.new
    @job = job
  end

  def process
    set_current_path
    set_variables
    build_pdf_from_prawn(COVER_DIR)
    retrieve_from_s3
    build_pdf_from_prawn(QUOTE_DIR)
    true
  rescue
    clear_directory!
    false
  end

  def to_pdf
    @combined.to_pdf
  end

  private

  def build_pdf_from_prawn(file)
    dir = "#{current_path}/#{file}.pdf"
    pdf = render_to_string "admin/documents/#{file}"
    f = File.open dir, 'wb'
    f.write pdf
    push_pdf(dir)
  end

  def set_current_path
    @current_path = FileUtils.mkdir_p("#{dir_root}/#{job.id}").pop
  end

  def dir_root
    "#{Rails.root}/tmp/data/pdf/job_arrangements"
  end

  def push_pdf(file)
    @combined << CombinePDF.load(file)
  end

  def retrieve_from_s3
    downloaded_file_dir = "#{current_path}/downloaded_file.pdf"
    open(downloaded_file_dir, 'wb') do |file|
      file << open(asset.attachment.url).read
    end
    push_pdf(downloaded_file_dir)
  end

Very strange bug, save for me doing anything totally stupid!

Thanks

Rico

boazsegev commented 8 years ago

Hi Rico ( @fredmcgroarty ),

Thank you for opening the issue.

I can't run the code you gave me because I'm missing some pieces (such as a PDF file to test against and the Prawn template), but I suspect the issue might actually be related to the file system and what is known as file caching.

Files are written to disk periodically and only when the write buffer is full or the file handle is closed.

The code you showed me doesn't close the file, so it's possible the data is still in the cache and wasn't written to the disk. This means that when CombinePDF tried to read the file, it found an empty file (the data was in the file handle cache, which CombinePDF didn't have access to).

Please try replacing this:

  def build_pdf_from_prawn(file)
    dir = "#{current_path}/#{file}.pdf"
    pdf = render_to_string "admin/documents/#{file}"
    f = File.open dir, 'wb'
    f.write pdf
    push_pdf(dir)
  end

with this:

  def build_pdf_from_prawn(file)
    dir = "#{current_path}/#{file}.pdf"
    pdf = render_to_string "admin/documents/#{file}"
    f = File.open dir, 'wb'
    f.write pdf
    f.close
    push_pdf(dir)
  end

OR, even better, this:

  def build_pdf_from_prawn(file)
    dir = "#{current_path}/#{file}.pdf"
    pdf = render_to_string "admin/documents/#{file}"
   @combined << CombinePDF.parse(pdf)
  end

This might solve the issue, if the issue is related to file caching.

If this doesn't work, could you send me a small example I can test on my computer to find the issue?

I wonder if the issue happens with any downloaded PDF you use or if it's related to the specific file you're downloading from S3 (maybe something in that file isn't processed properly)...? I would really need more data to go on.

fredmcgroarty commented 8 years ago

@boazsegev Many thanks for your suggestion, it worked. Of course close the file - elementary! Though I opted for your revised solution, which worked straight away. Again, many thanks!

boazsegev commented 8 years ago

Hi @fredmcgroarty ,

I'm happy this solved you issue. I hope you didn't mind my super simplified explanation.

Good luck with your project and happy hacking :-)