When trying to read some PDFs with lazy: true, the parser raises an exception and stops. The same PDFs are read without a problem with lazy: false and no errors are indicated.
Origami::PDF.read(pdf_content_stream, lazy: true, verbosity: Origami::Parser::VERBOSE_TRACE)
[info ] ...Reading header...
[error] Breaking on: "\xBF\xBD\xEF\xBF\xBD\x04|\r\xEF\xBF..." at offset 0x3445c
[error] Last exception: [Origami::InvalidObjectError] Object shall begin with '%d %d obj' statement
[debug] Skipping this indirect object.
[trace] Read Stream object, 33 0 R
Origami::Parser::ParsingError: Invalid xref stream
from /.rvm/gems/ruby-2.5.1/gems/origami-2.1.0/lib/origami/parsers/pdf/lazy.rb:159:in `parse_revision_from_xrefstm'
I've managed to trace the error to the fact that in the snippet below, parse_object fails on its first attempt, logging the two [error]s, and then successfully returns a Origami::Stream object. Of course Origami::Stream!=Origami::XRefStream so the exception is raised. But an interesting thing is that XrefStream < Stream.
I don't know much about PDF files, so I don't know if this is working as intended, or not. In any case, what solutions would there be to properly reading the file? Any ones more proper than below?
begin
Origami::PDF.read(pdf_content_stream, lazy: true)
rescue Origami::Parser::ParsingError
Origami::PDF.read(pdf_content_stream, lazy: false)
end
Ruby: 2.5.1 Origami: 2.1.0
When trying to read some PDFs with
lazy: true
, the parser raises an exception and stops. The same PDFs are read without a problem withlazy: false
and no errors are indicated.I've managed to trace the error to the fact that in the snippet below,
parse_object
fails on its first attempt, logging the two[error]
s, and then successfully returns aOrigami::Stream
object. Of courseOrigami::Stream
!=
Origami::XRefStream
so the exception is raised. But an interesting thing is thatXrefStream < Stream
.I don't know much about PDF files, so I don't know if this is working as intended, or not. In any case, what solutions would there be to properly reading the file? Any ones more proper than below?