boazsegev / combine_pdf

A Pure ruby library to merge PDF files, number pages and maybe more...
MIT License
735 stars 157 forks source link

Named destination don't work if there's many of them #73

Closed sLe1tner closed 8 years ago

sLe1tner commented 8 years ago

I was hoping that the problem was resolved, but after going through the latest releases, I noticed that it still persists. From what I can tell, it happens when a PDF contains many named destination links. I used the following code:

          b_pdf = Prawn::Document.new do
            text("Hi prawn")
            outline.page title: 'First pdf'
            text '<link anchor="hʤä.l,l o">Click me, to go to päge 3</link>', inline_format: true
            19.times do |t|
              text "<link anchor=\"hʤä.l,l #{t}\">Click me, to go to päge 3 6</link>", inline_format: true
            end
            start_new_page
            text("Heeeeelloh!")
            text '<link anchor="hʤä.l,l o">Click mee, to go to päge 3</link>', inline_format: true
            start_new_page
            text("Blub")
            add_dest('hʤä.l,l o', dest_fit(page))
            19.times do |t|
              add_dest("hʤä.l,l #{t}", dest_fit(page))
            end
            outline.page title: '1st pdf page 2'
            outline.define do
              2.times do |t|
                section "Chapter #{t}", closed: false do
                  page :title => 'Page 1', destination: 1
                  page :title => 'Page 2', destination: 2
                  page :title => 'Page 3', destination: 1
                  page :title => 'Page 4', destination: 2
                end
              end
            end
          end

          c_pdf = Prawn::Document.new do
            text("Hi prawn")
            outline.page title: 'First pdf'
            27.times do |t|
              text "<link anchor=\"hʢä.l,l #{t}\">Click me, to go to päge 3 6</link>", inline_format: true
            end
            start_new_page
            text("Heeeeelloh!")
            text '<link anchor="hʢä.l,l o">Click mee, to go to päge 3</link>', inline_format: true
            start_new_page
            text("Blub")
            add_dest('hʢä.l,l o', dest_fit(page))
            27.times do |t|
              add_dest("hʢä.l,l #{t}", dest_fit(page))
            end
            outline.page title: '1st pdf page 2'
            outline.define do
              2.times do |t|
                section "Chapter #{t}", closed: false do
                  page :title => 'Page 1', destination: 1
                  page :title => 'Page 2', destination: 2
                  page :title => 'Page 3', destination: 1
                  page :title => 'Page 4', destination: 2
                end
              end
            end
          end

Now, depending on the order in which I add them, I get different results:

          report << CombinePDF.parse(b_pdf.render)
          report << CombinePDF.parse(c_pdf.render)
          final_report = report.to_pdf
          send_data final_report,
                  filename: "#{@project.name}.pdf",
                  type: 'application/pdf',
                  disposition: 'inline'

results in all the links from the first PDF (b_pdf) working, but some of the second PDF (c_pdf) are not working:

Not working named destination links:
Click me, to go to päge 3 --- 0
Click me, to go to päge 3 --- 1
Click me, to go to päge 3 --- 10
Click me, to go to päge 3 --- 11
Click me, to go to päge 3 --- 12
Click me, to go to päge 3 --- 13
Click me, to go to päge 3 --- 14
Click me, to go to päge 3 --- 15
Click me, to go to päge 3 --- 16
Click me, to go to päge 3 --- 17

However.. If I do it in the opposite order:

          report << CombinePDF.parse(c_pdf.render)
          report << CombinePDF.parse(b_pdf.render)
          final_report = report.to_pdf
          send_data final_report,
                  filename: "#{@project.name}.pdf",
                  type: 'application/pdf',
                  disposition: 'inline'

This way, none of the here first PDF (c_pdf) are working, but all the links from the other PDF (b_pdf) are still working fine.

If I use b_pdf alone:

          report << CombinePDF.parse(b_pdf.render)
          final_report = report.to_pdf
          send_data final_report,
                  filename: "#{@project.name}.pdf",
                  type: 'application/pdf',
                  disposition: 'inline'

all the named destinations are fine. However - however.. In the code generating the b_pdf, I use a loop that runs for 19 iterations to generate a few named destinations:

            19.times do |t|
              text "<link anchor=\"hʤä.l,l #{t}\">Click me, to go to päge 3 6</link>", inline_format: true
            end
            # and 2 start_new_page's later
           19.times do |t|
              add_dest("hʤä.l,l #{t}", dest_fit(page))
            end

if I change the 19 to 20, which results in the code generating b_pdf like this:

          b_pdf = Prawn::Document.new do
            text("Hi prawn")
            outline.page title: 'First pdf'
            text '<link anchor="hʤä.l,l o">Click me, to go to päge 3</link>', inline_format: true
            20.times do |t|
              text "<link anchor=\"hʤä.l,l #{t}\">Click me, to go to päge 3 --- #{t}</link>", inline_format: true
            end
            start_new_page
            text("Heeeeelloh!")
            text '<link anchor="hʤä.l,l o">Click mee, to go to päge 3</link>', inline_format: true
            start_new_page
            text("Blub")
            add_dest('hʤä.l,l o', dest_fit(page))
            20.times do |t|
              add_dest("hʤä.l,l #{t}", dest_fit(page))
            end
            outline.page title: '1st pdf page 2'
            outline.define do
              2.times do |t|
                section "Chapter #{t}", closed: false do
                  page :title => 'Page 1', destination: 1
                  page :title => 'Page 2', destination: 2
                  page :title => 'Page 3', destination: 1
                  page :title => 'Page 4', destination: 2
                end
              end
            end
          end

and again only use this PDF:

          report << CombinePDF.parse(b_pdf.render)
          final_report = report.to_pdf
          send_data final_report,
                  filename: "#{@project.name}.pdf",
                  type: 'application/pdf',
                  disposition: 'inline'

none of the named destinations are working.

Same if I use c_pdf only:

          report << CombinePDF.parse(c_pdf.render)
          final_report = report.to_pdf
          send_data final_report,
                  filename: "#{@project.name}.pdf",
                  type: 'application/pdf',
                  disposition: 'inline'

none of the named destinations are working. And same here, if I change the amount of named destination links generated in c_pdf from 27 to 19, all of them work (they don't work if it's 20):

          c_pdf = Prawn::Document.new do
            text("Hi prawn")
            outline.page title: 'First pdf'
            19.times do |t|
              text "<link anchor=\"hʢä.l,l #{t}\">Click me, to go to päge 3 --- #{t}</link>", inline_format: true
            end
            start_new_page
            text("Heeeeelloh!")
            text '<link anchor="hʢä.l,l o">Click mee, to go to päge 3</link>', inline_format: true
            start_new_page
            text("Blub")
            add_dest('hʢä.l,l o', dest_fit(page))
            19.times do |t|
              add_dest("hʢä.l,l #{t}", dest_fit(page))
            end
            outline.page title: '1st pdf page 2'
            outline.define do
              2.times do |t|
                section "Chapter #{t}", closed: false do
                  page :title => 'Page 1', destination: 1
                  page :title => 'Page 2', destination: 2
                  page :title => 'Page 3', destination: 1
                  page :title => 'Page 4', destination: 2
                end
              end
            end
          end

          report << CombinePDF.parse(c_pdf.render)
          final_report = report.to_pdf
          send_data final_report,
                  filename: "#{@project.name}.pdf",
                  type: 'application/pdf',
                  disposition: 'inline'

PS: I used release v0.2.26. Don't mind the weird unicode characters in the links, I was just making sure that the problem isn't caused by special characters. The behaviour stays the same without them.

boazsegev commented 8 years ago

I think I found the issue... but I left the Rubocop on, so all the code got beautified and it's hard to see the little change I made...

The change was made here.

The code before was:

        while (pos = should_resolve.pop)

and now:

        while should_resolve.any?
          pos = should_resolve.pop

This meant than nil values would cause the data processing to stop in the middle. nil values were introduced when nested name dictionaries existed because of the following code:

          elsif pos.is_a? Hash
            pos = pos[:referenced_object] || pos
            next if resolved.include?(pos.object_id)
            should_resolve << pos[:Kids]
            should_resolve << pos[:Names]
          end

which now looks like this:

          elsif pos.is_a? Hash
            pos = pos[:referenced_object] || pos
            next if resolved.include?(pos.object_id)
            should_resolve << pos[:Kids] if pos[:Kids]
            should_resolve << pos[:Names] if pos[:Names]
          end

However, I tested this only with the some of examples you gave me, but I'm a bit third from opening the PDF source files and skipping my launch break.

Could you see if there's anything else we need to update before we declare this as working and announce the feature in the CHANGELOG? - I already announced it in the CHANGELOG, so we better have it working :-)

sLe1tner commented 8 years ago

Sweet, all the test cases I had work with the changes you made! :+1:

boazsegev commented 8 years ago

🎉

boazsegev commented 8 years ago

Tested, (fixed a minor issue) and released - woohooo 👍