golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.27k stars 17.7k forks source link

archive/zip: add support for extended timestamps #18359

Closed bobjalex closed 7 years ago

bobjalex commented 7 years ago

What version of Go are you using (go version)?

go version go1.8beta2 windows/amd64

What operating system and processor architecture are you using (go env)?

set GOARCH=amd64 set GOBIN= set GOEXE=.exe set GOHOSTARCH=amd64 set GOHOSTOS=windows set GOOS=windows set GOPATH=C:\GoLib_beta set GORACE= set GOROOT=C:\Go_beta set GOTOOLDIR=C:\Go_beta\pkg\tool\windows_amd64 set GCCGO=gccgo set CC=gcc set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=C:\Users\Bob\AppData\Local\Temp\go-build778111039=/tmp/go-build -gno-record-gcc-switches set CXX=g++ set CGO_ENABLED=1 set PKG_CONFIG=pkg-config set CGO_CFLAGS=-g -O2 set CGO_CPPFLAGS= set CGO_CXXFLAGS=-g -O2 set CGO_FFLAGS=-g -O2 set CGO_LDFLAGS=-g -O2

What did you do?

I use zip archives in contexts that require correct file times, and are required to be compatible with non-Go zip libraries. Go's default archive/zip writing is incompatible with many other zip libraries and programs (e.g. Java's and Python's provided libraries, the info-zip zip program distributed with Linux, ...).

The problem:

Some solutions: (1) Change archive/zip to store local non-extended times.

I favor (2), since it is flexible and would not violate the Go 1 principle. Default operation would be exactly as pre-Go 8 versions.

What did you expect to see?

Extended time stored as UTC and old (main) time fields stored as local time.

What did you see instead?

All file times stored as UTC.

dsnet commented 7 years ago

Does it require to call header.Modified.In(time.Local)

When writing a zip file, it will be required. The old zip behavior was to use UTC in the legacy timestamp fields by default. The Go1 compatibility agreement binds us to preserve that behavior since it's not a bug on Go's writer.

mattn commented 7 years ago

Okay, I understand. Thank you. BTW, is it possible to add Writer#Locale to set whole timezone automatically in the headers? It should be UTC.

dsnet commented 7 years ago

Writer.Locale is a reasonable API change, but let's not include this in this release. While it is clunkier to do header.Modified.In(time.Local) for every file, I'm don't think it's that bad.

If we had Writer.Locale, it would still take a single line to set it. On the other hand, you typically add a file to a zip archive in a for loop, so it would still be a single line to call header.Modified.In(time.Local).

bobjalex commented 7 years ago

At first look, it seems to satisfy my requirements. And is done in a quite nice way! Over the next day or two I'll try a "test recode" of my stuff and let you know how that works out.

On Wed, Nov 1, 2017 at 11:51 AM, Joe Tsai notifications@github.com wrote:

@bobjalex https://github.com/bobjalex, take a look at CL/74970 https://golang.org/cl/74970 to see if it suites your needs. It maintains backwards compatibility where UTC is typically the timezone used in the legacy timestamp fields.

However, since the FileHeader.Modified field is newly added, you can always set it directly to control the timezone used for the date.

fh, _ := zip.FileInfoHeader(fi) // This uses UTC for Modified to be backwards compatible fh.Modified = dh.Modified.In(time.Local)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/18359#issuecomment-341203230, or mute the thread https://github.com/notifications/unsubscribe-auth/AFYNARBJbobGKv9awZRTHsZ2B45-0SO2ks5syL3JgaJpZM4LPvG1 .

dsnet commented 7 years ago

I'll be submitting the CL on Monday, since it's already past the freeze. Be sure to provide feedback before then.

bobjalex commented 7 years ago

I have some thoughts about the Reader, Trying to be quick, I haven't had time to think this through very well, so apologies in advance if it is nonsense :)

The Reader attempts to determine the timezone if both the legacy

and extended timestamps are present since it can compute the delta between the two values.

Thought 1: The mini-spec does not say what the time zone of Modified is if there are no extended timestamps? (I'm assuming UTC in the following comments.)

Thought 2: I'm trying to reason how an application that chooses to interpret a legacy time field as local time would work:

For each file in the Reader: if Modified is UTC: // Assume that there are no extended fields and // Modified is the legacy time value as UTC. Offset the time to make it local. else: Use as is (the guessed time zone)

Would the above snippet always work? If the legacy time stamp matches an extended timestamp, the guessed time zone would be UTC (?) and would be erroneously offset. Is there another way to determine whether the zone is guessed or actual?

On Sat, Nov 4, 2017 at 8:30 AM, Joe Tsai notifications@github.com wrote:

I'll be submitting the CL on Monday, since it's already past the freeze. Be sure to provide feedback before then.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/18359#issuecomment-341905793, or mute the thread https://github.com/notifications/unsubscribe-auth/AFYNAYIZQHsrtqph7U9AoZDvxc-KmDywks5szIMmgaJpZM4LPvG1 .

dsnet commented 7 years ago

Thought 1: The mini-spec does not say what the time zone of Modified is if there are no extended timestamps? (I'm assuming UTC in the following comments.)

(By mini-spec, I assume you mean the Go documentation?). In the absence of extended timestamps, Modified will always be UTC.

Thought 2: I'm trying to reason how an application that chooses to interpret a legacy time field as local time would work:

The encoding of FileHeader.Modified.Zone is not so you can detect whether extended timestamps are present, but more so that round-trip encoding of the FileHeader produces a similar ZIP file, where the legacy time field should be identical to the original for most reasonable cases (assuming the values differ exactly by the timezone offset).

Would the above snippet always work?

No. If the ZIP file was created in the UTC timezone itself, the current API does not provide enough information regarding whether extended timestamps were present or not.

If I understand you correctly, your main use case is that (in the lack of an extended timestamp), you wanted the reader to interpret the legacy field as a local time.

If I remove the offset != 0 case in struct.go:173, then you should be able to determine when extended timestamps are present except for the case where the legacy and extended timestamp fields differ in a unreasonable way (e.g., difference exceeds -12 or +14 hours).

Alternatively, you could check FileHeader.Extra, but I'm not a fond of how much low-level detail the ZIP API exposes to the user.

Other notes:

bobjalex commented 7 years ago

Simply stated, my use case is to be able to properly unzip all proper zipfiles that occur in the wild, without requiring advance knowledge as to whether they contain extended timestamps or not.

"Minimal" zip files do not include any extended timestamps. Even some modern programming languages' zip libs, for simplicity, do not output extended timestamps. It is not an uncommon caee and will not go away any time soon.

Thinking out loud:

It seems like everything is well covered in the new API except the case of reading a zip entry with no extended time fields. It doesn't know what to do with the timezone of the legacy timestamp. (Of course, this is a well-known shortcoming of the zip format.)

The time-zone guessing for cases where an extended timestamp is included seems solid (and a great idea). But for the no-extended-timestamp case it is up to the client to figure out what the time zone should be. To do this, the client must be able to determine whether the given UTC timestamp is a no-extended-timestamp case or not.

Possibilities:

  1. Your offset != 0 case in struct.go:173 https://go-review.googlesource.com/c/go/+/74970/5/src/archive/zip/struct.go#173 suggestion sounds reasonable. I'm taking your word that this would produce the wanted result :-) Might be nice to add a TimezoneUnknown() method to limit the caller's exposure to lib internals

  2. Implement an option for the reader that tells it whether the no-extended-timestamps cases should be passed as UTC or local. Those are probably the only likely requirements. This might be a bit simpler for the caller than the previous, but I don't know the impact on the implementation.

On Sun, Nov 5, 2017 at 1:26 AM, Joe Tsai notifications@github.com wrote:

Thought 1: The mini-spec does not say what the time zone of Modified is if there are no extended timestamps? (I'm assuming UTC in the following comments.)

(By mini-spec, I assume you mean the Go documentation?). In the absence of extended timestamps, Modified will always be UTC.

Thought 2: I'm trying to reason how an application that chooses to interpret a legacy time field as local time would work:

The encoding of FileHeader.Modified.Zone is not so you can detect whether extended timestamps are present, but more so that round-trip encoding of the FileHeader produces a similar ZIP file, where the legacy time field should be identical to the original for most reasonable cases (assuming the values differ exactly by the timezone offset).

Would the above snippet always work?

No. If the ZIP file was created in the UTC timezone itself, the current API does not provide enough information regarding whether extended timestamps were present or not.

If I understand you correctly, your main use case is that (in the lack of an extended timestamp), you wanted the reader to interpret the legacy field as a local time.

If I remove the offset != 0 case in struct.go:173 https://go-review.googlesource.com/c/go/+/74970/5/src/archive/zip/struct.go#173, then you should be able to determine when extended timestamps are present except for the case where the legacy and extended timestamp fields differ in a unreasonable way (e.g., difference exceeds -12 or +14 hours).

Other notes:

  • If there are multiple extended timestamps, the last one is used, even if every single timestamp listed says something different from each other.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/18359#issuecomment-341956704, or mute the thread https://github.com/notifications/unsubscribe-auth/AFYNARuVk-_-V4-pZumBM8iqKUV7v0Iqks5szXFLgaJpZM4LPvG1 .

dsnet commented 7 years ago

I'm going to modify the conditional on struct.go:173 since it's a simple change. Thus, to determine whether extended timestamps are present, you can simply just check whether FileHeader.Modified.Location() == time.UTC or not.

Not to disparage your use-case, but I personally believe the number of users that really care that much about accurate timestamps, are not common, so I'm reluctant to add more API surface, especially for a feature of ZIP that was so poorly designed and inconsistent across the world.

Your feedback is helpful, though.

bobjalex commented 7 years ago

Agreed that users that care about accurate timestamps in their zipfiles are a distinct minority. Heck, the legacy timestamps are only accurate to 2 sec anyway, not to mention the lack of time zone (hence the proliferation of "extra" time fields)..

I think your proposal is fine, Thanks for working through this.

On Sun, Nov 5, 2017 at 9:37 PM, Joe Tsai notifications@github.com wrote:

I'm going to modify the conditional on struct.go:173 since it's a simple change. Thus, to determine whether extended timestamps are present, you can simply just check whether FileHeader.Modified.Location() == time.UTC or not.

Not to disparage your use-case, but I personally believe the number of users that really care that much about accurate timestamps, are actually not super common, so I'm reluctant to add more API surface, especially for a feature of ZIP that was so poorly designed and inconsistent across the world.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/18359#issuecomment-342051195, or mute the thread https://github.com/notifications/unsubscribe-auth/AFYNAVfhU4IolA0dflHs_JlLSv6k0d5Bks5szps0gaJpZM4LPvG1 .