Possible to use PopplerDocument as through-stream?

msageryd commented 3 years ago

I would like to get some basic pdf information on the fly while streaming my files to S3.

Would it be possible to use PopplerDocument as a through stream so I can harvest pageCount and width,height for each page on the fly?

blackbeam commented 3 years ago

I don't believe that this is the case, but i might be wrong. PDF structure is quite complex, it uses indirection + some parts may or may not be compressed. The seek operation seems mandatory for successful parsing, so the stream must be buffered. Moreover you must be able to seek through the whole document, therefore the buffer must be no less than the size of the document. This is identical to reading the whole document from an input stream and then pushing the whole document to the output stream. You can use an in-memory document for this, but don't forget to set a size threshold to fallback to an in-file document to avoid OOM.

msageryd commented 3 years ago

Thanks, I suspected something like that.

blackbeam / poppler-simple

Possible to use PopplerDocument as through-stream? #40