dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.07k stars 4.69k forks source link

XPS documents in .NET Core can't be opened #51929

Closed wstaelens closed 2 months ago

wstaelens commented 3 years ago

Description

As adviced from /runtime/wpf I should open a ticket here for the dotnet/runtime team...
Please see https://github.com/dotnet/wpf/issues/3546 for full details, test documents and many more.

.NET Framework 4.8 is able to open XPS files from printer driver containing many .piece files.
.NET 5 ( / .NET Core) is unable to open XPS files from printer driver containing many .piece files.

dotnet-issue-labeler[bot] commented 3 years ago

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

ghost commented 3 years ago

Tagging subscribers to this area: @carlossanlop See info in area-owners.md if you want to be subscribed.

Issue Details
### Description As adviced from `/runtime/wpf` I should open a ticket here for the `dotnet/runtime` team...
Please see https://github.com/dotnet/wpf/issues/3546 for full details, test documents and many more. .NET Framework 4.8 is able to open XPS files from printer driver containing many .piece files.
.NET 5 ( / .NET Core) is **unable** to open XPS files from printer driver containing many .piece files.
Author: wstaelens
Assignees: -
Labels: `area-System.IO.Compression`, `untriaged`
Milestone: -
KevinCathcart commented 3 years ago

When porting System.IO.Packaging over to .NET Core, two features interrelated features were dropped: Streaming mode (which allows creation of interleaved parts), and support for reading/updating existing interleaved parts.

The reason they were dropped was because the porter was under instruction to replatform the the code to use System.IO.Compression, instead of MS.Internal.IO.Zip. Since at the time, a major goal was still to keep the runtime size small by avoiding duplicated code, this made sense.

However, part of the support for interleaved files (which is specific to Open Packaging Conventions) was implemented in the MS.Internal.IO.Zip namespace, and therefore was not ported, even though it could have been moved into to Packaging namespace where I would argue it should have been in the first place, and updated to be based on System.IO.Compression, just like the rest of the OPC specific code.

This issue is basically requesting to restore the missing support for reading and updating interleaved files. The read and update support actually has no public API surface (unlike streaming), and its absence is a correctness issue, since support for reading interleaved OPC files is not optional in the specification. On the other hand, the ability to produce such files is optional, since interleaving is only meant as a possible performance enhancement, so restoring streaming mode is less important.

It appears that all the needed code has been released under MIT, either as part of the initial commit of the code to this repository (where unfortunately, it had already been heavily modified from the original, so hopefully no needed code was removed, but it looks like most was just ifdefed out), or for the MS.Internal.IO.Zip code needed, was released as part of open sourcing WPF, although the code was deleted in later commits due to not being used by anything over there.

wstaelens commented 3 years ago

ok @KevinCathcart thanks for explaining in detail.

What does it mean for us now? Will it be added again to .NET Core /.NET 5?

KevinCathcart commented 3 years ago

It is up to the area owners to decide if restoring such support is code they are willing to have and maintain. If it is then somebody needs to port the code, which while probably not hard, is also not completely trivial.

Given that .NET 6 is already full in terms of planned projects, it would be unlikely to happen for that release unless a contributor creates a pull request. The chances of this being backported to older releases is pretty much zero, although as an OOB package, one ought to be able to reference a newer version from older net core based frameworks.

Perhaps @carlossanlop or one of the other area owners can comment if a pull request fixing this regression vs .NET Framework (which is technically also a correctness issue) by porting the missing code might be accepted. As mentioned before there is no associated public API change. Instead, Zip-based OPC packages with interleaved parts would simply have those parts be visible again if opened with this package. Right now, such parts get silently ignored, which is not even easy for apps to detect is happening, unless they see a required part is missing or look in the zip file themselves.

wstaelens commented 3 years ago

🤞 @carlossanlop

wstaelens commented 2 years ago

@Jozkee @carlossanlop @KevinCathcart @GrabYourPitchforks any updates? 🥺

wstaelens commented 2 years ago

🥺 🤞 😭

wstaelens commented 2 years ago

@carlossanlop @Jozkee what's the status of this?

danmoseley commented 2 years ago

Could this be achieved by using the code in a non Microsoft owned project? Is there a way to get a sense of the level of interest in this feature, such that it would justify the cost of maintenance etc?

wstaelens commented 2 years ago

@danmoseley I hope I understand and answer your questions correctly. (I'm not a native English speaker)

reproduction code can be found here: https://github.com/dotnet/wpf/issues/3546#issuecomment-695942720

sample documents can be found here: https://github.com/dotnet/wpf/issues/3546#issuecomment-824713800

the reason why it can't be opened can be found here: https://github.com/dotnet/wpf/issues/3546#issuecomment-825061189

One of our products is based on/working with the Microsoft V4 XPS print driver. This Microsoft print driver (Microsoft owned code) generate XPS files (XPS documents, the print jobs) using .piece files.

.NET Framework is able to open these XPS documents (print jobs). .NET (Core) is unable to open these XPS documents.

Why? Credits to @ThomasGoulet73 to figuring out that a part of .NET Framework code to support .piece files in System.IO.Packaging.* was not ported.

So we have a product that worked fine in .NET Framework but doesn't work fine anymore in .NET (Core). We got it working, by a big workaround to first extracting the XPS files, combining all the part files into a single big XPS file and then recreate an XPS file again to open it in .NET (Core) again. As you understand this is terrible, slow and expensive method in a printing solution.

I hope this answered the question "Could this be achieved by using the code in a non Microsoft owned project?".

"Is there a way to get a sense of the level of interest in this feature, such that it would justify the cost of maintenance etc?"

To begin with, everybody prints, but not everybody writes printing code or is in the printing industry. It is a niche, as you probably know. Also, I don't think that many companies have already ported all their 'printing solutions' from .NET Framework to .NET Core. (or the solutions they provide are rather services, at a later stage in the document handling process after printing). However when this (and other XPS issues will be fixed), each time a few thousand people will benefit from it, from our customers worldwide. (Working together with the biggest print multinational...) So maybe fixing this issue might be interesting for only a few developers but in the end thousands of customers will be happy. Developers working on the core of printing or on print solutions are in a minority, but that doesn't mean issues should be left open. We have customers that print on two-weekly/monthly base 150-200k files, for them it would be super beneficial, but also for the other thousand of companies/users.

Personally: If customers buy e.g. production machines, you know they will be working with it in reality for the next 10 to 15 years. Meaning we can't keep supporting them with 'old' .NET Framework products, when we provide a .NET (Core) alternative product, it should be the same as .NET Framework or better. Now with hacky workarounds it is worse. In the end we or the customer give Microsoft a bad name "sorry, Microsoft problem, they won't help and don't want to fix it".

P.S.: As you have noticed we tried to wake up the WPF team at Microsoft and I believe the community has spoken, as you have seen in the latest discussions ( https://github.com/dotnet/wpf/discussions/6542 ). One of my personal reasons is that XPS is a good clean format, but as with WPF/XAML/XPS there is a lack of updates/fixes/improvements. I really hope when somebody looks at this XPS issue, additional improves to performance will be done also, for example: https://github.com/dotnet/runtime/issues/51930 Many people print or work with XPS documents on a daily base, this will also come in handy for a lot of users/customers. (+ The biggest culprit with XPS is the requirement for the STA thread, I have still some hopes in a dark corner but yeah...probably false hope)

https://github.com/dotnet/runtime/issues?q=XPS https://github.com/dotnet/wpf/issues?q=XPS https://github.com/dotnet/winforms/issues?q=XPS https://github.com/mono/SkiaSharp/issues?q=XPS

I really hope Microsoft will spend some time on porting this .NET Framework code to .NET Core (and maybe some additional XPS fixes/improvements). Our customers and our team would be so thankful.

ericstj commented 2 years ago

It looks to me like “Piece” support is just a wrapper over top of the directory structure of the ZIP to create an aggregate stream that hides the underlying file parts from the zip: ZipPackagePart.cs (microsoft.com) (location in NETFx that decides to use the abstraction) ZipPackagePart.cs (location in .NETCore that would need to handle this).

Here ZipFileInfo in netfx is analogous to ZipArchiveEntry in netcore. A null entry indicates that it might be a piece-file. Here’s the read path: ZipPackage.cs (microsoft.com) (netfx) ZipPackage.cs (netcore)

I can’t see anything in the implementation of piece support that looks particularly hard but I haven’t dug too deep. I see it uses seekable zip entry streams which means it won’t work with our read-only archives (necessary for large zips) but that’s a limitation – not a blocker. I might be wrong but I bet we could fix that limitation as it looks like the seeking could even be redundant. I tend to agree that it looks like it was accidentally excluded because of the namespace. The devil may be in the details but I don’t expect it would be to hard to get to a prototype by adding in the missing code and mapping old concepts to new. I think we'd be willing to accept a contribution here that takes the InterleavedZipPartStream.cs and related classes from .NETFramework and ports them to System.IO.Packaging.

The first step here would be to move that code over, porting from the WPF-Zip types to the IO.Compression types and see if there are any difficult dependencies. If not then I think we could accept a PR that added that along with tests.

KevinCathcart commented 2 years ago

A mostly untested (I've got it to compile, but that is about it) prototype port of the interleaving feature can be found at: https://github.com/dotnet/runtime/compare/main...KevinCathcart:interleave

The adjustments to use ZipArchiveEntry were mostly straightforward. The biggest differences are the .Name for ZipFileInfo is .FullName in ZipArchiveEntry, and deletion is done via the entry, rather than the zip archive, and lastly the use of the ZipStreamManager. There were also a few small changes to account for nullability annotations, and some stylistic things that this repo has flagged as errors.

I've not tried do any big changes like avoiding seeking for reads (I suspect the .NET Framework Version may simply have failed to handle zip files that big). In any case, multi gigabyte XPS files are probably a low priority to support, and XPS files are the only use of OPC that I am aware of that uses interleaving. I'm pretty sure the office open XML spec forbids interleaving, and obviously nuget's spec also forbids the use of interleaving (since the current nuget client cannot handle it).

Hope this helps.

wstaelens commented 2 years ago

Big up to @ericstj and @KevinCathcart looking forward to using this when it is available.

In any case, multi gigabyte XPS files are probably a low priority to support, and XPS files are the only use of OPC that I am aware of that uses interleaving.

Regular (GDI) print jobs can grow to sevaral gigabytes due to the metafiles in them.

XPS files tend to start smaller but can reach also several 100's of MB's or GB's easily. and that brings you to other issues: see and this.

wstaelens commented 2 years ago

@KevinCathcart any idea when this will be added/released in .NET 6?

danmoseley commented 2 years ago

@ericstj this is marked 7.0, but we are now in stabilization mode for that release. from your comment above, I believe the status is that this isn't currently scheduled, but we would take a contribution as you described if offered?

I need to post an announcement in this repo with the schedule. I'll do that now.

ericstj commented 2 years ago

I believe the status is that this isn't currently scheduled, but we would take a contribution as you described if offered?

Yes, at the moment this isn't something that the @dotnet/area-system-io-compression team is working on but a contribution would be considered. We are approaching a place in .NET 7.0 where we won't be able to take risky changes and are already past the place for new API. It is still possible to get no-API changes in net7.0 but the @dotnet/area-system-io-compression team would need to make the call once seeing how risky it is.

wstaelens commented 2 years ago

any updates? Again being moved to "future" 😢 .

.net core 1x --> .net core 2x --> .net core 3.x --> .net 5 --> .net 6 --> .net 7 --> "future" again...

our frustrations are getting bigger again. 😠

wstaelens commented 1 year ago

Any updates? @jeffhandley @ericstj @danmoseley @KevinCathcart @carlossanlop

ericstj commented 1 year ago

I don't think there is a change from before. The issue is still open and marked as "help-wanted": we'd accept a PR for this, but it's not being actively worked on. Here's what I mentioned before.

The first step here would be to move that code over, porting from the WPF-Zip types to the IO.Compression types and see if there are any difficult dependencies. If not then I think we could accept a PR that added that along with tests.

wstaelens commented 1 year ago

Sorry @ericstj ...

We use software developed by Microsoft. We therefore assume that you will make some efforts to fix bugs. We make efforts to use your products and ensure that our customers continue to use the Microsoft/.net eco-system. You can hardly expect us to also do the work for as a Microsoft employee. We already made an effort to alert you to the problem and write a (slow) workaround ourselves. Hundreds of our users would benefit from your fix. Now we keep saying that the problem lies with Microsoft and you guys won't fix it. (Believe me, at government agencies they don't like to hear this). Which of course is not a good advertisement for you guys either, but that's just the way it is. I don't understand how you guys haven't addressed this problem since 2014/2016. Until now, you guys are not spending time in such issues that have been a problem for years. Just putting out some fires and appeasing people by moving a label and giving false hope. Something that has worked for years in .NET Framework still doesn't work to this day since the first versions of .net core. We were going to pay for the fix in the past, but that itself was never addressed. You guys provide a default print driver that produces files you can't open with your own recommended framework....

I really hope you guys can take up @KevinCathcart his proof-of-concept and fix it. He already made great efforts.

(I am not able to spend days to get WPF/net running locally here, let alone weeks to learn to understand your entire internal code first before writing a fix. That knowledge is already with you guys, as @KevinCathcart proved).

Thanks.

(update: created a PR from Kevin his code so that you can pick it up: https://github.com/dotnet/runtime/pull/78374 )

wstaelens commented 1 year ago

@ericstj what is the status? Can you make a push for .net 8 or do we have to wait another release?

davidfowl commented 1 year ago

.NET 8 is done.

wstaelens commented 1 year ago

😞 27-06-2016 - ... 2024? 2025? 7+ years have passed since .net core release without any fixes. fingers crossed...?

wstaelens commented 8 months ago

what is the status of this for .net 9 ?

edwardneal commented 8 months ago

I'm currently working on this, and hopefully there should be a PR ready for review "soon" - probably the next week or so, realistically. I'm adding test coverage for the new functionality as I go, but it doesn't do much to test the existing code. That particular gap should be triaged separately IMO.

It's possible to control whether or not the XPS Document Writer printer triggers this behaviour, which makes it easier to test with real files. In the printer's advanced printing preferences, the interleaving option can be toggled off or on (the default behaviour is off.) Of course, this doesn't help particularly if you're not in control of the files you receive!

wstaelens commented 8 months ago

@edwardneal you can find test documents here: you can find some test files here from the xps jobs: https://github.com/dotnet/wpf/issues/3546#issuecomment-824713800

In general, XPS still uses arraylists and code base could be improved. I often use this 900+ test document: https://spaenhiers.files.wordpress.com/2022/10/bidprentjes_2022-10-20.pdf

imagine the performance boost if the STA thread requirement could be dropped.

edwardneal commented 8 months ago

Thanks. I briefly tested with file 02 outside of the automatic tests, and was able to enumerate, read and write its package parts.

There are definitely performance and functionality improvements to be made from ZipArchive upwards. Some of the largest improvements need an API review, but I think they're ultimately necessary. I'd personally like to see async support in ZipArchive, some memory usage reductions, and to move thumbnails and digital signatures from XpsDocument to Package - they're part of the base OPC specification, and Office documents (and possibly NuGet packages and VSIX extensions) use them.

wstaelens commented 8 months ago

@edwardneal will these changes be backported to .net 6 and .net 7 (as some of our customers are stuck for months/years on .net 6/7....)

edwardneal commented 7 months ago

I don't know about .NET 6, but I'd be surprised if any new functionality was backported to .NET 7. MS' support lifecycle puts .NET 7 in Maintenance support, in which only security fixes are made.

The process for requesting a backport is found here. It reads to me like you'll need to wait for the current PR to be merged, then raise a second PR requesting its backport.

KevinCathcart commented 7 months ago

I think it is unlikely for backports to be approved for this, but it also shouldn't be needed at least longer term, as if accepted it will become part of the System.IO.Packaging nuget that targets .NET Standard 2.0, so any impacted .NET 6 or .NET 7 or .NET 8 project should be able to just reference the 9.0 version of the package to add this functionality back.

Unfortunately that NuGet probably won't contain builds targeting 6 or 7, so the .NET Standard 2.0 build would get used, I'd expect that would merely impact performance rather than functionality, and the resulting performance we are talking about here should still be at .NET Framework level or better, so quite unlikely to be a showstopper.

wstaelens commented 7 months ago

.net 6 would be great but not necessary. Backport to .net 7 would be needed.

The problem is that some clients (gov, fortune500, ..) cannot install newer (internally not approved) .net releases on some servers, and at least not .net 8 / .net 9 (some even require that a release needs to be on the market for 6months or 1 year already to be approved as "ok" or "stable") That is why I asked if this will be backported so that we could roll this out to some customers. Minor releases from an approved framework version are faster accepted. 🙄

Appreciate already what you are doing! Let's make XPS great again 👍

wstaelens commented 7 months ago

@edwardneal would you be able to tackle this other EPIC issue with xps in .net? (dropping the STA requirement) https://github.com/dotnet/wpf/issues/4000#issuecomment-845712273 🤞 😃

can't wait to test the changes in production 👍

carlossanlop commented 2 months ago

We can consider this fixed by https://github.com/dotnet/runtime/pull/97898 unless someone reports otherwise.