3MFConsortium / spec_core

3MF's Core specification
BSD 2-Clause "Simplified" License
55 stars 16 forks source link

Improve specification of the ZIP format and compression specification #39

Closed bubnikv closed 1 year ago

bubnikv commented 3 years ago

The 3MF core spec only vaguely references the ZIP format. OPF does not specify ZIP compression either

http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm

Wikipedia is tiny bit more specific:

https://en.wikipedia.org/wiki/Open_Packaging_Conventions

The ISO/IEC 29500-2:2008 specification and the second edition of ECMA-376 makes a normative reference to PKWARE, Inc.'s .ZIP File Format Specification version 6.2.0 (2004), and supplements it with a normative set of clarifications. Note: The older first edition of ECMA-376 makes an informative (i.e., non-normative) reference to the newer PKWARE Inc's ".ZIP File Format Specification" version 6.2.1 (2005).[1] The ZIP format is not specified by any international standard but has widespread community and developer acceptance.

First, we should canonize the compression algorithm to DEFLATE. Wikipedia says:

https://en.wikipedia.org/wiki/ZIP_(file_format)

The ZIP file format permits a number of compression algorithms, though DEFLATE is the most common.

WinZip, starting with version 12.1, uses the extension .zipx for ZIP files that use compression methods newer than DEFLATE; specifically, methods BZip, LZMA, PPMd, Jpeg and Wavpack. The last 2 are applied to appropriate file types when "Best method" compression is selected.[27][28]

The same wikipedia page contains an interesting "Standardization" section, discussing enforcing the DEFLATE algorithm, disabling multi-part ZIPs etc.

Second, the definition of the ZIP 64bit extension is vague and it is not quite clear whether 64bit ZIP 3MFs are allowed or not. We have found out, that the 64bit ZIP 3MFs produced by miniz library that PrusaSlicer utilizes are not consumed by Microsoft tools. There was a discussion

https://github.com/3MFConsortium/archived_documents/tree/master/MeetingMaterials_44/Zip64_Issues_Materialise.zip

quite some time ago, but I suppose no resolution was drawn from that.

jordig100 commented 3 years ago

This is what is specified in the Core spec:

The ZIP archive MUST follow the .ZIP File Format Specification by PKWARE Inc. Files within the ZIP archive that represents a 3MF document MUST use the compression method Deflate ("8 - The file is Deflated") or be stored uncompressed ("0 - The file is stored (no compression)") in accordance with the OPC specification ("Annex C, (normative) ZIP Appnote.txt Clarifications").

So, the only compression supported is DEFLATE.

bubnikv commented 3 years ago

Lukas Hejl of Prusa Research made quite an extensive study of both the ZIP64 extension support and UTF-8 encoding support for Prusa Research and 3MF consorcium.

In a nutshell:

1) ZIP64 including the streaming extension is not difficult to implement, it is mostly supported by ZIP consumers with the exception of some Microsoft applications and Craftware FDM slicer. Our writeup mentions how we have implemented the streaming extension in a way that the ZIP64 extension is only used if needed, so that Microsoft tools will still consume these ZIPs. We wish Microsoft implements ZIP64 OPC support. Also some interactive tools (for example Windows Explorer) have an issue manipulating ZIP64 archives (adding / removing files).

2) Files with UTF-8 encoded names are correctly extracted by most ZIP consumers, however many ZIP producers fail to encode file names into UTF-8 (for example Windows Explorer).

ZIP Report.pdf

jordig100 commented 3 years ago

Hi Vojtěch,

He have this issue to track the part names and Unicode.

https://github.com/3MFConsortium/spec_core/issues/42

What I found in the OPC is the following:

OPC Section 9.1.1.1 Part Name Syntax has the following: 9.1.1.1 Part Name Syntax A Part name shall be an IRI and shall be encoded as either a Part IRI or a Part URI. A Part IRI is a physical representation that permits direct use of Unicode characters. A Part URI is a physical representation that uses a percent-encoding for non-ASCII Unicode characters. [Note: Not all versions of the ZIP specification support a Part name represented as a Part IRI. To preserve interoperability, implementers are encouraged to use the currently more prevalent Part URI representation. end note]

What it seems is it recommend to use URI (ascii) for better interoperability, but it doesn’t prevent IRI with Unicode (UTF8).

Attached a 3MF with that image as Unicode that 3DBuilded embedded as URI with escaped sequences: “%D4%AA_image.jpg".

On the other hand the attached test case P_XXX_104_03.3mf has the model part name as UTF8 IRI representation: “Ԫ3dmodel.model". Is everyone supporting this case?

Regards Jordi

From: Vojtěch Bubník @.> Sent: Thursday, May 20, 2021 4:18 PM To: 3MFConsortium/spec_core @.> Cc: Gonzalez, Jordi @.>; Comment @.> Subject: Re: [3MFConsortium/spec_core] Improve specification of the ZIP format and compression specification (#39)

Lukas Hejl of Prusa Research made quite an extensive study of both the ZIP64 extension support and UTF-8 encoding support for Prusa Research and 3MF consorcium.

In a nutshell:

  1. ZIP64 including the streaming extension is not difficult to implement, it is mostly supported by ZIP consumers with the exception of some Microsoft applications and Craftware FDM slicer. Our writeup mentions how we have implemented the streaming extension in a way that the ZIP64 extension is only used if needed, so that Microsoft tools will still consume these ZIPs. We wish Microsoft implements ZIP64 OPC support. Also some interactive tools (for example Windows Explorer) have an issue manipulating ZIP64 archives (adding / removing files).
  2. Files with UTF-8 encoded names are correctly extracted by most ZIP consumers, however many ZIP producers fail to encode file names into UTF-8 (for example Windows Explorer).

ZIP Report.pdfhttps://github.com/3MFConsortium/spec_core/files/6516560/ZIP.Report.pdf

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/3MFConsortium/spec_core/issues/39#issuecomment-845164081, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AG3B74ADLADN6NDTHGHI4NTTOUKZZANCNFSM43BGTMDA.

bubnikv commented 3 years ago

We have found out that Microsoft 3D Builder accepts ZIP64 encoded 3MFs if the following conditions are fulfilled:

While MS 3D builder does not refuse these ZIP64 files, if the uncompressed archive is larger than 4 GiB, MS 3D builder hangs when loading such a file.

martinweismann commented 2 years ago

Several parts of this issue already covered by : https://github.com/3MFConsortium/spec_core/commit/cfc50ffb7805ab1ec187893961f20bb55c1ba775

martinweismann commented 1 year ago

Main issue resolved. see
https://github.com/3MFConsortium/spec_core/issues/68 fro what's left