101arrowz / fflate

High performance (de)compression in an 8kB package
https://101arrowz.github.io/fflate
MIT License
2.21k stars 77 forks source link

Mtime is set incorrectly on timezon #219

Open 1oglop1 opened 1 month ago

1oglop1 commented 1 month ago
    const jsfiles = await glob(globPattern, { cwd, absolute: false });
    const promises = jsfiles.sort().map(async (file) => {
      // console.log(file);
      const fpath = path.join(cwd, file);
      const buffer = new Uint8Array(await fsAsync.readFile(fpath));
      return {
        path: file,
        buffer,
      };
    });

    const resolvedPathBuffer = await Promise.all(promises);
    const filenamesAndContents = resolvedPathBuffer.reduce((acc, { path, buffer }) => {
      return { ...acc, [path]: buffer };
    }, {});

    const date = 504932400000; // Wednesday, 1 January 1986 03:00:00 in UTC

    const fflateZipContent = fflate.zipSync(filenamesAndContents, {
      os: 3,
      mtime: date,
    });

    const zipName = `${name}_${hash}.zip`
    const zipPath = path.join(fnDir, `../01${zipName}`);
    fs.writeFileSync(zipPath, Buffer.from(fflateZipContent));

Examinating the zip file with unzip -l shows the modified time as 01-01-1986 04:00 I can confirm this does not happen with UZIP (which defaults to 1980-01-01 00:00), I am yet to try jszip

101arrowz commented 1 week ago

fflate expects a JavaScript Date object or a Unix epoch timestamp for the mtime field, not a DOS-style date number. It does all the conversions necessary to embed a DOS-style date in the ZIP file for you. This is the reason behind the different mtime behavior.

If you use a Unix epoch timestamp instead, things work fine.

const date = new Date('1986-01-01T03:00:00.000Z'); // or const date = 504932400000
1oglop1 commented 1 week ago
const date = new Date('1986-01-01T03:00:00.000Z');

I tried several things and yet the result is still the same. I'm not sure if this is MacOS or if something is missing in the implementation but the result is still +1 hr and I did not have the same problem with the JSzip which I decided to use in the end.

https://youtube.com/shorts/wlVRh70Sz18

101arrowz commented 1 week ago

I just re-tested this myself and the Unix mtime is properly set in UTC. There is no bug here, your timezone is probably one hour ahead of UTC. When the unzip command prints out a timestamp, it's in your local timezone, which is why it shows as one hour later. You can check that the UTC timestamp is correct by unzipping and using TZ=UTC stat yourfile.txt on the extracted file.

1oglop1 commented 1 week ago

Unzipped mtime is not important for my use case. I need to get the consistent hash of the zipfile content between systems. And I wasn't able to achieve that with fflate. Also, I am not very familiar with zip standards (if there are any) so I tested this on multiple machines in different timezones, in CI, MacOS, Ubuntu (my other machine).

And here is the result

running locally:

4cf29b7734eb689c95095aa582c5fbd1  res/result_1980_fflate.zip
59abddcfdcb30d13b2270a33bccd7d31  res/result_1980_jszip.zip
fd396cb11d576cae9ebca060cc30b118  res/result_1980_uzip.zip
4a8209e85c409064607cc9ea071f5aca  res/result_1986_fflate.zip
a12b49ffd4099ed1199f6096ff3bd18a  res/result_1986_jszip.zip
fd396cb11d576cae9ebca060cc30b118  res/result_1986_uzip.zip

running in Actions https://github.com/1oglop1/jszip-games/actions/runs/10716229343/job/29713270977:

4cf29b7734eb689c95095aa582c5fbd1  res/result_1980_fflate.zip
59abddcfdcb30d13b2270a33bccd7d31  res/result_1980_jszip.zip
fd396cb11d576cae9ebca060cc30b118  res/result_1980_uzip.zip
6ce3ddbdc496aa6efb4e207ee78030e6  res/result_1986_fflate.zip
a12b49ffd4099ed1199f6096ff3bd18a  res/result_1986_jszip.zip
fd396cb11d576cae9ebca060cc30b118  res/result_1986_uzip.zip

As you can see there is a difference between local and GH actions for fflate a12b49ffd4099ed1199f6096ff3bd18a res/result_1986_jszip.zip - local 6ce3ddbdc496aa6efb4e207ee78030e6 res/result_1986_fflate.zip - github CI

I suspect it might have to do something with TZ, would you be able to point out what's wrong? Or what does your code does differently than other packages?

101arrowz commented 1 week ago

This is a good point, and you are correct that fflate handles things differently from JSZip here. The timestamp stored in ZIP files is unfortunately a relative timestamp that depends on a timezone not specified in the file. The timestamps are thus only useful if you know in advance the timezone in which the ZIP file was created.

fflate currently stores the timestamp in the ZIP file according to the local timezone, while JSZip seems to store them relative to UTC. (Note that UZIP is completely ignoring your timestamp, the MD5 sums are the same for the two files.)

fflate is following the convention of years of Unix utilities here, while JSZip's behavior is actually a long-time bug. The downside of fflate's approach is exactly this issue: given a fixed UTC timestamp, it will not be encoded the same on machines in different timezones.

If you want deterministic ZIP archives, you should specify Date objects with year, month, days, etc. rather than with an epoch timestamp. This will cause your time to be encoded locally, and thus be deterministic across machines.

const date = new Date(1980, 0, 1); // January 1, 1980 at 00:00:00 in local time
const date = new Date(1986, 0, 1, 3); // January 1, 1986 at 03:00:00 in local time

Additionally, you can almost always "convert" a Unix epoch timestamp to a what it would be if reinterpreted in local time to get deterministic results, but be warned this is not guaranteed to work for all UTC times:

const utcDate = new Date('1986-01-01T03:00:00.000Z');
const date = new Date(utcDate.getTime() + d.getTimezoneOffset() * 60000);
1oglop1 commented 1 week ago

Javascript is Wow ... always something. I was aware of uzip behaviour and used it as a "benchmark" to see if other libs work the way I need or not.

But I am absolutely blown away (again) by the behaviour of Date, it always gets me! I'm quite certain that I attempted to create the Date object using numeric parameters, but I may have lost track during my experiments. My previous experience with the same process comes from Python where working with the date is relatively straightforward and I had no issues producing deterministic results.

It is actually quite funny that JSZip bug helped me 🤦 .

I can confirm the correct behaviour using

const date = new Date(1980, 0, 1); // January 1, 1980 at 00:00:00 in local time
const date = new Date(1986, 0, 1, 3); // January 1, 1986 at 03:00:00 in local time

Thanks a lot for taking your time to explain this!

In this case, it is not a bug. Can we improve the docs for unfortunate souls like me?