gettalong / hexapdf

Versatile PDF creation and manipulation for Ruby
https://hexapdf.gettalong.org
Other
1.21k stars 69 forks source link

Infinite loop in `HexaPDF::Document.open(file)` #306

Closed tdeo closed 2 months ago

tdeo commented 2 months ago

Hello,

I am running into an issue where HexaPDF hangs in a infinite loop when running HexaPDF::Document.open(file).

Here is the top of the stacktrace when I stop the process:

/usr/local/bundle/ruby/3.3.0/gems/hexapdf-0.43.0/lib/hexapdf/parser.rb:367:in `block in startxref_offset'
/usr/local/bundle/ruby/3.3.0/gems/hexapdf-0.43.0/lib/hexapdf/parser.rb:367:in `rindex'
/usr/local/bundle/ruby/3.3.0/gems/hexapdf-0.43.0/lib/hexapdf/parser.rb:367:in `startxref_offset'
/usr/local/bundle/ruby/3.3.0/gems/hexapdf-0.43.0/lib/hexapdf/revisions.rb:75:in `from_io'
/usr/local/bundle/ruby/3.3.0/gems/hexapdf-0.43.0/lib/hexapdf/document.rb:192:in `initialize'
/usr/local/bundle/ruby/3.3.0/gems/hexapdf-0.43.0/lib/hexapdf/document.rb:150:in `new'
/usr/local/bundle/ruby/3.3.0/gems/hexapdf-0.43.0/lib/hexapdf/document.rb:150:in `open'

I unfortunately can't share the PDF I came across which triggers this behavior but I think I managed to figure a reproduction from any valid pdf with the following:

My understanding is that there's a bad conjunction of the count of null bytes in regards to the step size defined here. It seems that the + 40 during io.read here was an attempt at avoiding this but I believe it should also have @io.pos = [@io.pos - 40, 0].max just before to work properly.

I can try adding a repro test case and a fix if this issue seems clear enough

gettalong commented 2 months ago

Thanks for the detailed issue description and the reproduction instructions. I tried them with multiple valid files but it didn't hang for me. However, with your instructions and the line indicated in the backtrace, I was able to find the problem and fix it.

Since this is a critical problem, I released version 0.44.0 with the fix.