Closed Reyko closed 8 years ago
Hi @Reyko and thank you for this pull request.
Reading through this code, it's clear there are some things we should consider fixing, but I doubt if this is the best way to go about them.
Allow me to explain:
object[:raw_stream_content].length
was a mistake in the original code, we should be looking at bytesize
rather then length
.PDF files contain a dictionary entry with the number of bytes that should be present in the stream (which means that the stream might contain a a different number of bytes, the tail being ignored, and that would indicate a possible error)...
stream_length
should return the registered length (the dictionary entry) in bytes (Fixnum) and actual_length
should indicate the actual bytesize
of the stream. It's used by the gem to warm about possible errors.
PDF data might be stored as a reference rather then a value, this is where actual_object
and actual_value
come in (one is used when looking for a referenced dictionary and the other when looking for a referenced value).
actual_object
resolves pointers and returns the actual container of the data... which is where my code when wrong. actual_value
is what I wanted, it resolves pointers to the value rather then the value container
I'm not sure if removing the error check is the best fix.
I think a better approach might be changing this:
stream_length = actual_object(object[:Length])
actual_length = object[:raw_stream_content].length
to:
stream_length = actual_value(object[:Length])
actual_length = object[:raw_stream_content].bytesize
if !stream_length # it's a required entry, but it might be missing
warn "PDF error, required stream length data is missing. Attempting to fix."
stream_length ||= actual_length
end
If you agree, could you update the pull request so that I can merge the fix?
Thanks again for pointing out this issue! Bo.
Hey @boazsegev thanks for getting back to me :)
I understand your suggestion and I tried implementing it but actual_value
still returns a hash and not a Fixnum
.
Maybe is it something wrong with the actual_value
?
I have added a test which can demonstrate the problem if you run it with the stream_length
.
Thanks again
This could be specific to the PDF file you're working on - any chance I could see it?
It is inside the repo in the fixtures encrypted.pdf
Using the file, I found the source of the issue.
The Decrypt module is called before the PDF objects are all linked together (before reference mapping). At that point, only the root
object was mapped, meaning the actual_value
and actual_object
methods cannot perform their jobs properly for any other object.
Also, there is the issue that the length of the stream is stored as a reference which the Decrypt module cannot find (since mapping has yet to occur).
I can't move the mapping process before the decryption is complete, since object streams might contain more objects and mapping the references before the decryption might cause errors during the reference object linkage.
This might take me some time, but I'm working on a solution.
One option is to scan specifically for any missing objects, so that instead of linking the objects we are just locating the few objects we need for the decryption.
Nice one @boazsegev, I m looking at it as well to see if I can help.
Cheers
I think the new version fixes the issue - could you check that it works correctly for you?
If it's working correctly, would it be okay to close the pull request?
Hey,
Unfortunately, I m on my annual leave and I have no access to a computer. However, it is ok to close the pull request, I m gonna check if it works once I m back and if it doesn't I could email you if that's ok.
Thanks a lot for fixing this!!!
Have fun on your vacation :-)
I m not sure what is the difference between stream_length and actual_length, but at the moment stream_length returns a hash:
object[:Length] => {:is_reference_only=>true, :indirect_generation_number=>0, :indirect_reference_id=>6}
This causes an exception as it is doing the comparison:
if actual_length < stream_length
The result of this is to get rescued and return