Infinite loop while parsing corrupted pdf

MatthiasValvekens / pyHanko

pyHanko: sign and stamp PDF files

MIT License

486 stars 71 forks source link

Infinite loop while parsing corrupted pdf #236

Closed peteris-zealid closed 1 year ago

peteris-zealid commented 1 year ago

In the function skip_over_whitespace there are these lines

while tok in PDF_WHITESPACE:
    tok = stream.read(1)

if the stream has reached the end then tok == b"". This causes an infinite loop. This bug can be exploited by passing a corrupted pdf where the pointer to the xref table actually points behind the end of file.

In particular these lines in read_xrefs

stream.seek(startxref)
if misc.skip_over_whitespace(stream):
    ...

Proposed solution is to check against the empty buffer and raise an error.

MatthiasValvekens commented 1 year ago

Thanks, as always! Good catch, and indeed a dangerous bug. I have a fix queued up, hang on :)

MatthiasValvekens commented 1 year ago

Merged (as you probably noticed), and I also just did a bugfix release (0.17.1) to address this. Thanks again!

peteris-zealid commented 1 year ago

You might want to handle the skip_over_comment as well. Not sure if it can be exploited, but why take chances.

MatthiasValvekens commented 1 year ago

Ah, good point... I only looked at usages of PDF_WHITESPACE, but skip_over_comment indeed has a similar issue. Will take a look at that one as well.

MatthiasValvekens commented 1 year ago

Also dealt with. Thanks!