bmeg / bmeg-etl

ETL configuration for BMEG
1 stars 2 forks source link

json emitter gzip header #271

Closed bwalsh closed 5 years ago

bwalsh commented 5 years ago

By default, gzip writes filename and mtime into the header. This has the effect or creating .gz files with different md5 hashes, even though the input is identical.

TODO in emitter, set mtime to 0

gz = gzip.GzipFile(filename='', compresslevel=9, fileobj=open('mytest.gz', mode='wb'), mtime=0)
gz.write('dddddddddddddddddd')
gz.close()
adamstruck commented 5 years ago

Note: if you unzip the file produced in the example above and then gzip it again you end up with a different md5. It would be nice if we had a way to reproduce the same gzipped files from the command line after decompressing.

gunzip mytest.gz
gzip -n -9 mytest