Open daharmon opened 3 weeks ago
We did not get the error you reported, but observed huge performance issues, when reading the file. We aborted the process after two hours. We fixed that performace issues, that slowed down loading of objects from object streams, for the next release of PDFsharp. Loading the file now finishes after eight seconds. Did you also observe performance problems with that file? If yes, how long did you wait to get this error message?
I did not see performance issues because it immediately failed to load. It's interesting that you could open it. I was able to load it with itext just fine but not pdfsharp. Looping through thousands of pdf documents, this was the only one that would fail and it always failed with the error indicated in the issue title.
I did not see performance issues because it immediately failed to load.
Which version of PDFsharp are you using? GDI? WPF? Core? 6.1.0? 6.1.1? 6.2.0 Preview 1?
We now tried to reproduce your issue by loading the file from an Azure File Storage. However, we still did not get the error you reported, but an Azure exception when PDFsharp tries to get the stream length: System.NotSupportedException: 'Specified method is not supported.' Azure.Core.Pipeline.RetriableStream.RetriableStreamImpl.Length.get() in RetriableStream.cs
We tried it with the "PDFsharp" nuget packages of version 6.1.1 and 6.2.0-preview-1. To work around this issue, we copied the stream to a new MemoryStream and opened that with PDFsharp. Now we ran into the expected performance issue of PDFsharp with the stated versions and could successfully load the file with our development version. Please try if you get the reported error when loading the file locally. I would expect the performance issue to occurr instead. Loading the file from Azure, please try to copy the stream to a MemoryStream and use that. We want to improve the error messages, but will surely not support loading all kinds of streams that may have several restrictions. So using a MemoryStream to get a fully supported stream is and will be the way to go in most cases when loading files via streams from the internet. Please notify if you still get the reported error with one of these approaches.
No worries. It's odd that all of the other files we retrieve come from azure file storage worked fine coming straight from the stream. If it makes any difference, the method we're using to retrieve the stream from azure is ShareFileClient.OpenReadAsync(). I would assume that by copying the stream to a memory stream, we're introducing additional memory overhead, which I'd like to avoid. Either way, I appreciate you looking into this!
Interesting. Using ShareFileClient.OpenReadAsync() I get the error you described, using ShareFileClient.DownloadAsync() I get the NotSupportedException when accessing Length. Well, in both cases copying the stream into a MemoryStream worked. So both the streams got from Azure seem to have some kind of limitations. As stated before, we can't guarantee every stream loaded from the internet to be compatible with PDFsharp. So copying it to a MemoryStream or searching for another method to get a compatible stream would be the clean way. But figuring out which stream is compatible by trying different methods may be not that easy. As you said, it works for you with all of the other files. So, maybe success depends on the file size or the order PDFsharp has to read the objects from the file,
2024-Compliance-Supplement-V1.pdf
Reporting an Issue Here
Expected Behavior
The document should be read just fine. I can open it in Adobe, Chrome, Edge.
Actual Behavior
I get the error indicated in the title of this post.
Steps to Reproduce the Behavior
Stream coming directly from Azure File Share. I can confirm this code works against all of our other documents. PdfDocument inputDocument = PdfReader.Open(streamcontent, PdfDocumentOpenMode.Import);
I'm using the latest preview version (6.2.0-preview-1) and have also tried the latest non-preview version. Both throw the same error.
Any ideas?