CrossRef / pdfextract

MOVED TO https://gitlab.com/crossref/pdfextract
https://gitlab.com/crossref/pdfextract
MIT License
508 stars 89 forks source link

zones is missing a dependency on left_margins (RuntimeError) #32

Open rtalexander opened 8 years ago

rtalexander commented 8 years ago

Hi, I'm getting the following error message when attempting to run pdf-extract: zones is missing a dependency on left_margins (RuntimeError), which is from the stack trace given below. Any thoughts on this?

I am running this on Mac OS X 10.11,2 El Capitan, with the following Ruby configuration:

$ ruby --version
ruby 2.3.0p0 (2015-12-25 revision 53290) [x86_64-darwin15]

$ gem --version
2.5.1

Stack trace:

pdf-extract --trace  extract-bib --resolved_references  Attacks\ on\ Cryptographic\ Protocols-\ A\ Survey.pdf
/usr/local/lib/ruby/gems/2.3.0/gems/pdf-extract-0.1.1/lib/pdf/extract/pdf.rb:144:in `block (3 levels) in invoke_calls': zones     is missing a dependency on left_margins (RuntimeError)    
  from /usr/local/lib/ruby/gems/2.3.0/gems/pdf-extract    -0.1.1/lib/pdf/extract/pdf.rb:142:in `each'    
  from /usr/local/lib/ruby/gems/2.3.0/gems/pdf-extract    -0.1.1/lib/pdf/extract/pdf.rb:142:in `block (2 levels) in invoke_calls'    
  from /usr/local/lib/ruby/gems/2.3.0/gems/pdf-extract    -0.1.1/lib/pdf/extract/pdf.rb:141:in `each_pair'    
  from /usr/local/lib/ruby/gems/2.3.0/gems/pdf-extract    -0.1.1/lib/pdf/extract/pdf.rb:141:in `block in i    nvoke_calls'    
  from /usr/local/lib/ruby/gems/2.3.0/gems/pdf-extract    -0.1.1/lib/pdf/extract/pdf.rb:137:in `each_pair'    
  from /usr/local/lib/ruby/gems/2.3.0/gems/pdf-extract    -0.1.1/lib/pdf/extract/pdf.rb:137:in `invoke_cal    ls'    
  from /usr/local/lib/ruby/gems/2.3.0/gems/pdf-extract    -0.1.1/lib/pdf/extract.rb:43:in `block in parse'    
  from /usr/local/lib/ruby/gems/2.3.0/gems/pdf-extract    -0.1.1/lib/pdf/extract.rb:39:in `each'    
  from /usr/local/lib/ruby/gems/2.3.0/gems/pdf-extract    -0.1.1/lib/pdf/extract.rb:39:in `parse    '    
  from /usr/local/lib/ruby/gems/2.3.0/gems/pdf-extract    -0.1.1/lib/pdf/extract.rb:54:in `view'    
  from /usr/local/lib/ruby/gems/2.3.0/gems/pdf-extract    -0.1.1/bin/pdf-extract:121:in `block (    4 levels) in <top (required)>'    
  from /usr/local/lib/ruby/gems/2.3.0/gems/pdf-extract    -0.1.1/bin/pdf-extract:118:in `each'    
  from /usr/local/lib/ruby/gems/2.3.0/gems/pdf-extract    -0.1.1/bin/pdf-extract:118:in `block     (3 levels) in <top (required)>'    
  from /usr/local/lib/ruby/gems/2.3.0/gems/commander-4    .4.0/lib/commander/command.rb:178:in     `call'    
  from /usr/local/lib/ruby/gems/2.3.0/gems/commander-4    .4.0/lib/commander/command.rb:153:in     `run'    
  from /usr/local/lib/ruby/gems/2.3.0/gems/commander-4    .4.0/lib/commander/runner.rb:444:in     `run_a    ctive_command'    
  from /usr/local/lib/ruby/gems/2.3.0/gems/commander-4.4.0/lib/commander/runner.rb:68:in `run!'    
  from /usr/local/lib/ruby/gems/2.3.0/gems/commander-4.4.0/lib/commander/delegates.rb:15:in `run!'
  from /usr/local/lib/ruby/gems/2.3.0/gems/commander-4.4.0/lib/commander/import.rb:5:in `block in <top (required)>'

Thanks,

Roger Alexander.

AnikoG commented 8 years ago

Hi Roger, does this happen to other pdfs, too?

alexbrandsen commented 7 years ago

I get the exact same error on some PDFs, however I'm running on Ubuntu:

pdf-extract --trace extract --headers --footers --references --no-lines --regions --set char_slop:0.4 ARC-2008-48_08ank_rapport.pdf > ARC.xml /var/lib/gems/2.1.0/gems/pdf-extract-0.1.1/lib/pdf/extract/pdf.rb:144:inblock (3 levels) in invoke_calls': zones is missing a dependency on left_margins (RuntimeError) from /var/lib/gems/2.1.0/gems/pdf-extract-0.1.1/lib/pdf/extract/pdf.rb:142:in each' from /var/lib/gems/2.1.0/gems/pdf-extract-0.1.1/lib/pdf/extract/pdf.rb:142:inblock (2 levels) in invoke_calls' from /var/lib/gems/2.1.0/gems/pdf-extract-0.1.1/lib/pdf/extract/pdf.rb:141:in each_pair' from /var/lib/gems/2.1.0/gems/pdf-extract-0.1.1/lib/pdf/extract/pdf.rb:141:inblock in invoke_calls' from /var/lib/gems/2.1.0/gems/pdf-extract-0.1.1/lib/pdf/extract/pdf.rb:137:in each_pair' from /var/lib/gems/2.1.0/gems/pdf-extract-0.1.1/lib/pdf/extract/pdf.rb:137:ininvoke_calls' from /var/lib/gems/2.1.0/gems/pdf-extract-0.1.1/lib/pdf/extract.rb:43:in block in parse' from /var/lib/gems/2.1.0/gems/pdf-extract-0.1.1/lib/pdf/extract.rb:39:ineach' from /var/lib/gems/2.1.0/gems/pdf-extract-0.1.1/lib/pdf/extract.rb:39:in parse' from /var/lib/gems/2.1.0/gems/pdf-extract-0.1.1/lib/pdf/extract.rb:54:inview' from /var/lib/gems/2.1.0/gems/pdf-extract-0.1.1/bin/pdf-extract:121:in block (4 levels) in <top (required)>' from /var/lib/gems/2.1.0/gems/pdf-extract-0.1.1/bin/pdf-extract:118:ineach' from /var/lib/gems/2.1.0/gems/pdf-extract-0.1.1/bin/pdf-extract:118:in block (3 levels) in <top (required)>' from /var/lib/gems/2.1.0/gems/commander-4.4.3/lib/commander/command.rb:178:incall' from /var/lib/gems/2.1.0/gems/commander-4.4.3/lib/commander/command.rb:178:in call' from /var/lib/gems/2.1.0/gems/commander-4.4.3/lib/commander/command.rb:153:inrun' from /var/lib/gems/2.1.0/gems/commander-4.4.3/lib/commander/runner.rb:446:in run_active_command' from /var/lib/gems/2.1.0/gems/commander-4.4.3/lib/commander/runner.rb:68:inrun!' from /var/lib/gems/2.1.0/gems/commander-4.4.3/lib/commander/delegates.rb:15:in run!' from /var/lib/gems/2.1.0/gems/commander-4.4.3/lib/commander/import.rb:5:inblock in <top (required)>'`

Has anybody found a workaround?

Mifrill commented 7 years ago

This gem does not understand pages that are in a horizontal position you can just skipped interations when pages in pdf file with horizontal orientaion

lib/pdf.rb

lines: 140 - 150

          if object_calls?
            @object_listeners.each_pair do |type, listeners|
              listeners.each do |listener|
                if objs[type].nil?
                  #raise "#{@pdf.operating_type} is missing a dependency on #{type}"
                  next
                end
                objs[type].each { |obj| listener.call obj }
              end
            end
          end

Besides need to use pdf-reader - v 1.2.0