dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.14k stars 4.71k forks source link

XPS documents from print driver are terribly slow #51930

Open wstaelens opened 3 years ago

wstaelens commented 3 years ago

As adviced from /runtime/wpf I should open a ticket here for the `` dotnet/runtime team... πŸ™„

from https://github.com/dotnet/wpf/issues/4000:

Having for example a document of 3000 pages being printed to a V4 driver. Because of the very annoying STA requirement, it takes ages to render the pages sequentally. We can't render the pages in parallel (if possible in C#, feel free to explain how), in other words other code and logic that works on individual pages is unable to go parallel and is slow because it all has to go sequentially. Eventually we go out of memory as we can't hold all the rendered pages for some actions we are doing.

The performance issues can easily be reproduced with Microsoft's own XPS Viewer and Microsoft XPS Document Writer (printer). When opening the original pdf (3MB) and we print it to the Microsoft XPS Document Writer printer as an .xps, it takes ages to print. Once it has been printed we have an .xps file grown to 50MB. Opening the xps in Microsoft XPS Viewer and searching a word (which exists e.g. on page 2668) literally takes ages as it processes sequentally through the document. Sumatra finds the word in about 50 seconds, XPS Viewer does it in Β±6 minutes. (to compare: foxit reader on the original pdf does it in 25 seconds).

I can't share this big file (confidential) but just take some pdf files, ebooks in pdf, with a lot of pages and print them. (or print and capture the XPS print jobs with a render filter to catch the xps on the microsoft generic V4 driver.)

Can these XPS printing issues please be tackled or prioritized?

.NET SDK 5.0.202 .NET runtime 5.0.5 Windows 10 20H2 (19042.928) Windows Server 2019 1809 (17763.1879)

Linked tickets:

Update

A file that you can test for example:

  1. navigate to https://www.spaenhiers.be/archief and click on Databank bidprentjes or "Bidprentjes" (direct link to .pdf file is sometimes updated: https://www.spaenhiers.be/Media/Default/docs/archief_Bidprentjes_2021-04-19.pdf or https://spaenhiers.files.wordpress.com/2022/05/bidprentjes_2022-02-17.pdf or https://spaenhiers.files.wordpress.com/2022/06/bidprentjes_-2022_06_18.pdf )
  2. Print the file to Microsoft XPS Document Writer (sloooooow 🐌 🏁 πŸ•)
  3. You'll notice the file size is HUGE compared to the PDF file.
  4. open the .xps file in Microsoft Xps Viewer, make sure it is on the first page and search for the value 67588 or Zwertvaegher
  5. go drink a coffee, eat some pizza, drink 5 beers and return till it found it.

(and yes parsing XPS with .NET 5 is also slow and takes a lot of memory etc etc... need some performance boots so that it is faster compared to PDF documents.)

dotnet-issue-labeler[bot] commented 3 years ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

ghost commented 3 years ago

Tagging subscribers to this area: @carlossanlop See info in area-owners.md if you want to be subscribed.

Issue Details
As adviced from `/runtime/wpf I should open a ticket here for the `` dotnet/runtime` team... πŸ™„ from https://github.com/dotnet/wpf/issues/4000: Having for example a document of 3000 pages being printed to a V4 driver. Because of the very annoying STA requirement, it takes ages to render the pages sequentally. We can't render the pages in parallel (if possible in C#, feel free to explain how), in other words other code and logic that works on individual pages is unable to go parallel and is slow because it all has to go sequentially. Eventually we go out of memory as we can't hold all the rendered pages for some actions we are doing. The performance issues can easily be reproduced with Microsoft's own XPS Viewer and Microsoft XPS Document Writer (printer). When opening the original pdf (3MB) and we print it to the Microsoft XPS Document Writer printer as an .xps, it takes ages to print. Once it has been printed we have an .xps file grown to 50MB. Opening the xps in Microsoft XPS Viewer and searching a word (which exists e.g. on page 2668) literally takes ages as it processes sequentally through the document. Sumatra finds the word in about 50 seconds, XPS Viewer does it in Β±6 minutes. (to compare: foxit reader on the original pdf does it in 25 seconds). I can't share this big file (confidential) but just take some pdf files, ebooks in pdf, with a lot of pages and print them. (or print and capture the XPS print jobs with a render filter to catch the xps on the microsoft generic V4 driver.) Can these XPS printing issues please be tackled or prioritized? .NET SDK 5.0.202 .NET runtime 5.0.5 Windows 10 20H2 (19042.928) Windows Server 2019 1809 (17763.1879) Linked tickets: * https://github.com/dotnet/wpf/issues/4000 * https://github.com/dotnet/wpf/issues/3546 * https://github.com/dotnet/runtime/issues/51929 **Update** A file that you can test for example: 1. navigate to https://www.spaenhiers.be/archief and click on `Databank bidprentjes` (direct link to .pdf file is sometimes updated: https://www.spaenhiers.be/Media/Default/docs/archief_Bidprentjes_2021-04-19.pdf ) 2. Print the file to `Microsoft XPS Document Writer` (sloooooow 🐌 🏁 πŸ•) 3. You'll notice the file size is HUGE compared to the PDF file. 4. open the .xps file in Microsoft Xps Viewer, make sure it is on the first page and search for the value `67588` or `Zwertvaegher` 5. go drink a coffee, eat some pizza, drink 5 beers and return till it found it. (and yes parsing XPS with .NET 5 is also slow and takes a lot of memory etc etc... need some performance boots so that it is faster compared to PDF documents.)
Author: wstaelens
Assignees: -
Labels: `area-System.IO.Compression`, `tenet-performance`, `untriaged`
Milestone: -
danmoseley commented 3 years ago

Thanks for the report @wstaelens . You mention XPS Reader so I'm guessing this problem is the same on .NET Framework?

Do you have an interest in debugging/investigating? Realistically that is the most likely way a fix would get in this release.

wstaelens commented 3 years ago

@danmoseley yes I mentioned the XPS Viewer I mentioned was just to compare for example XPS Viewer with SumatraPDF (also capable of viewing XPS documents).

Try to search for example something in XPS Viewer and do the same in SumatraPDF (e.g. in a 3000+ page document). You'll notice the difference (e.g. Β±50seconds in sumatra compared to Β±6 minutes). So generally I believe that when the MS team just profiles the code and maybe has some possibilities to update the code base that in terms of performance, memory allocations big steps can be taken for .NET 5 / .NET Core (and .NET Framework 4.8)

We are willing to help but it is hard to say what is exactly slow as the code that parses/generates/... the XPS files in a XPS print driver (XPSDrv) is Microsoft internal. We only capture the generated .xps. So I don't think I'll be a great help here... We believe it is a Microsoft internal thing. When we further process the XPS (compared to e.g. first converting it to PDF or just using EMF) the PDF/EMF format is faster, more optimized, takes less disk space (not true for EMF) and doesn't have the annoying STA-requirement like XPS.

Because XPS with .piece files doesn't seem to be supported in .NET 5 / .NET Core I expect that this might also be a reason that code base differs or that not everything has been implemented.

In general XPS is slow for printing, and producing/consuming XPS files takes up much more disk space and consumes more memory compared to other technologies. We even heavily considered going back to EMF for this (!!). The format in XML is clean, but XML and the parsing of XML is yeah... let's say we would like to see improvements. We see an increase in XPS usage, so please don't turn it down this time.

wstaelens commented 2 years ago

πŸ‘‹

wstaelens commented 2 years ago

Hey, any performance updates?

znakeeye commented 7 months ago

I'm rendering an XpsDocument to a MemoryStream which is then converted to a PDF on disk. After some research, I found two things which significantly impact performance.

1) DynamicResource kills performance. Completely! πŸ˜† 2) Package compression takes time. Use CompressionOption.NotCompressed (or CompressionOption.SuperFast if you really need compression).

DynamicResource performance problem Usually DynamicResource has similar performance characteristics as StaticResource. At least there seems to be a consensus in the community, that this is the case. But it certainly does not hold true for Xps!

After profiling my Xps generator, it became apparent that FindWeakReference() was a very, very hot path. See issue #4468. Also, please consider prioritizing PR 5610 from @batzen, as it aims to fix this very problem.

image