I tried processing a pdf I had lying around, and I got an IndexError in tokenize_stream. The root cause is that a dictionary in the content stream is split between two stream objects, and thus two invocations of tokenize_stream, so the << token gets thrown away before the end of the dictionary is parsed. Fixing this will require keeping the stack around between stream objects. I'll take a crack at a PR to do so.
I tried processing a pdf I had lying around, and I got an IndexError in tokenize_stream. The root cause is that a dictionary in the content stream is split between two stream objects, and thus two invocations of tokenize_stream, so the
<<
token gets thrown away before the end of the dictionary is parsed. Fixing this will require keeping thestack
around between stream objects. I'll take a crack at a PR to do so.Test case: