dbuenzli / zipc

ZIP archive and deflate codec for OCaml
https://erratique.ch/software/zipc
ISC License
23 stars 1 forks source link

possible issue with the default version_needed_to_extract #3

Closed v-gb closed 9 months ago

v-gb commented 10 months ago

Short version: the default version_needed_to_extract makes MS Word complain about zipc-generated files, and I think MS Word is right (gasp).

Longer version: I start with a .zip file that MS word is happy with (a .docx created by LibreOffice or MS Word or whatever else). If I roundtrip this zip through Zipc.of_binary_string/Zipc.to_binary_string, MS Word is happy. But if I do the same roundtrip and try to make even a noop change to a file using Zipc.File.{stored,deflate}_of_binary_string, MS Word complains about "the file contains unreadable content. If you trust this file, do you want to proceed?". I narrowed this to the value of version_needed_to_extract in the file header: 788 (the zipc default) causes the warning, 10 (the value in the input docx) makes MS Word happy. Reading https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT section 4.4.3.[12], I think version_needed_to_extract=788 is incorrect. I think version_needed_to_extract should be between 10 and 20 (MS Word is happy with both 10 and 20 at least). I assume almost all software ignores this field entirely, hence the fact that this has gone unnoticed.

(if you want to test things yourself, and you don't have a Windows machine to run MS Word on, MS Word runs on android)

dbuenzli commented 10 months ago

Well I don't know exactly what went into my mind here.

It seems that in the default I conflated version_made_by and version_needed_to_extract.

Sorry about that.

dbuenzli commented 10 months ago

I think version_needed_to_extract should be between 10 and 20 (MS Word is happy with both 10 and 20 at least).

That's consistent with what is written in table B.3 of annex B of ecma-376 part 2.

v-gb commented 10 months ago

On the upside, that'll force all the archives created by zipc to change, so you don't need to sad about the List.rev in the pull request :).

dbuenzli commented 9 months ago

I hesitated to automatically adjust the default of version_made_by depending on the compression argument according to 4.4.3.2 of the spec. but then not all compression techniques are mentioned there (e.g. lx and zstd are not). Simply using 2.0 and let decoders be smarter about other fields seems a better way to go to me.

Could you please confirm that this work for you with an:

opam pin add zipc --dev 
v-gb commented 9 months ago

I've been using 20 unconditionally in my workaround, so no need for more testing, your patch will work. Thanks!

v-gb commented 9 months ago

I was looking at the diff on my phone and I didn't realize that you changed more things than the file header. I'll rerun a test to make sure.

I had a problem trying to pin from your git servers:

$ opam pin add zipc --dev 
[ERROR] Could not synchronize /home/-/.opam/5.1.1/.opam-switch/sources/zipc from "git+https://erratique.ch/repos/zipc.git":
        Git repository seems just initialized, try again after your first commit
[zipc.0.1.0] fetching sources failed: git+https://erratique.ch/repos/zipc.git
[ERROR] Error getting source from git+https://erratique.ch/repos/zipc.git:
          - git+https://erratique.ch/repos/zipc.git
$ opam pin add zipc git+https://erratique.ch/repos/zipc
same error
$ git clone https://erratique.ch/repos/zipc.git
Cloning into 'zipc'...
Fetching objects: 75, done.
warning: remote HEAD refers to nonexistent ref, unable to checkout.
$ git clone https://erratique.ch/repos/zipc 
same error

so I'll be using opam pin add zipc git+https://github.com/dbuenzli/zipc.

v-gb commented 9 months ago

I think Word must have changed, because now, regardless of this change, roundtripping through Zipc.of_binary_string and Zipc.to_binary_string makes Word unhappy, and even if you tell it to proceed, it can't read the files.

v-gb commented 9 months ago

Ok, I tested with the same files I used originally, and ran into no problems. What I was running into in the previous message was an orthogonal issue that I reported in #4 .

dbuenzli commented 9 months ago

I had a problem trying to pin from your git servers:

git had the wonderful helpful idea to change the default branch name from master to main whenever you create new repos. This broke a lot of the scripts I have on newly created repos but it also seems to break opam pin here. The following seems to work:

opam pin add zipc git+https://erratique.ch/repos/zipc.git#main

Not sure where the problem is here (my install of cgit, the bare repo on the server or opam).

dbuenzli commented 9 months ago

(opam's message is misleading see https://github.com/ocaml/opam/issues/5324)

dbuenzli commented 9 months ago

Not sure where the problem is here (my install of cgit, the bare repo on the server or opam).

Apparently the bare repo's HEAD file was pointing on a non-existent master branch. Should be fixed now and opam pin add zipc --dev should work.