PdfReader.Open throws when trying to get Length of non-seekable Stream

empira / PDFsharp

PDFsharp and MigraDoc Foundation for .NET 6 and .NET Framework

https://docs.pdfsharp.net/

Other

531 stars 132 forks source link

PdfReader.Open throws when trying to get Length of non-seekable Stream #179

Closed ogix closed 1 month ago

ogix commented 1 month ago

I am trying to pass Stream that I get from BlobClient.DownloadStreamingAsync into PdfRead.Open method and it throws when trying to get Length property. It looks like it is using RetriableStream under the hood and it is not seekable.

https://github.com/empira/PDFsharp/blob/5fbf6ed14740bc4e16786816882d32e43af3ff5d/src/foundation/src/PDFsharp/src/PdfSharp/Pdf.IO/PdfReader.cs#L285

TH-Soft commented 1 month ago

As a workaround, download the stream into a MemoryStream and use that with PdfReader.

Adapting PDFsharp to RetriableStream will probably require several changes, so to resolve this issue, PDFsharp would probably copy the stream to a MemoryStream internally anyway. If this will be addressed in PDFsharp.

ogix commented 1 month ago

Ok, thanks. Thought that maybe it can take advantage of real Streaming.

ThomasHoevel commented 1 month ago

Ok, thanks. Thought that maybe it can take advantage of real Streaming.

What is "real Streaming" and what could the advantages be?

ogix commented 1 month ago

I mean avoid loading the whole document(s) into the memory. In my case I have multiple pdf documents that I merge into one. And this operation is common in my web app. So the only option now is to load all documents into memory that leads to high memory usage.

Azure Storage SDK BlobClient.DownloadStreamingAsync returns Stream that downloads document in chunks rather the whole at once.

So I am thinking if it's possible to do reading such stream in PDFSharp..

ThomasHoevel commented 1 month ago

PDFsharp reads the complete PDF into memory, that's how it works. Reading data from BobClient into a MemoryStream increases the memory usage, but that should not be an issue with PDF files downloaded from Azure. No need to open more than one source PDF at any time.

ogix commented 1 month ago

Thanks for explaining. Just wanted to know if it's possible. Closing.

ogix commented 1 month ago

Or at least maybe we should keep it open to add support for non-seekable Streams..