Closed saracen closed 3 years ago
/cc @dsnet
Change https://golang.org/cl/202217 mentions this issue: archive/zip: support adding raw files
Added change that uses option 1 (CreateHeaderRaw
) because it feels weird to use a field in FileHeader
solely for archiving as that structure is also used when reading an archive.
This is going to be new API, so turning into a proposal.
@dsnet Please add any thoughts you may have about this functionality. Thanks.
The project I'm working on would appreciate this functionality for performance reasons. We need to replace a single file within a zip. The DEFLATE tax is high and wasteful since we decode/encode without needing or using the intermediate data.
I like @rsc's Copy implementation in https://github.com/rsc/zipmerge. It might not be quite as flexible as the other proposals, but I think it fits well within the existing API and supports efficiently creating a new zip from an existing zip while allowing the user to replace/delete/update specific files.
// Copy copies the file f (obtained from a Reader) into w.
// It copies the compressed form directly.
func (w *Writer) Copy(f *File) error {
dataOffset, err := f.DataOffset()
if err != nil {
return err
}
if err := w.closeLastWriter(); err != nil {
return err
}
fh := f.FileHeader
h := &header{
FileHeader: &fh,
offset: uint64(w.cw.count),
}
fh.Flags |= 0x8 // we will write a data descriptor
w.dir = append(w.dir, h)
if err := writeHeader(w.cw, &fh); err != nil {
return err
}
r := io.NewSectionReader(f.zipr, dataOffset, int64(f.CompressedSize64))
if _, err := io.Copy(w.cw, r); err != nil {
return err
}
return writeDesc(w.cw, &fh)
}
Here are the latest compress/flate benchmarks from master at level 5 which archive/zip uses by default (Intel Xeon W-2135 CPU @ 3.70GHz × 6):
goos: linux
goarch: amd64
pkg: compress/flate
BenchmarkDecode/Digits/Level5/1e4-12 6530 164347 ns/op 60.85 MB/s 40598 B/op 7 allocs/op
BenchmarkDecode/Digits/Level5/1e5-12 960 1131532 ns/op 88.38 MB/s 40842 B/op 13 allocs/op
BenchmarkDecode/Digits/Level5/1e6-12 117 9673619 ns/op 103.37 MB/s 44800 B/op 79 allocs/op
BenchmarkDecode/Newton/Level5/1e4-12 8076 164681 ns/op 60.72 MB/s 41114 B/op 17 allocs/op
BenchmarkDecode/Newton/Level5/1e5-12 1653 765625 ns/op 130.61 MB/s 44283 B/op 31 allocs/op
BenchmarkDecode/Newton/Level5/1e6-12 180 6408771 ns/op 156.04 MB/s 61144 B/op 156 allocs/op
BenchmarkEncode/Digits/Level5/1e4-12 5710 215617 ns/op 46.38 MB/s
BenchmarkEncode/Digits/Level5/1e5-12 370 3232511 ns/op 30.94 MB/s
BenchmarkEncode/Digits/Level5/1e6-12 32 34852520 ns/op 28.69 MB/s
BenchmarkEncode/Newton/Level5/1e4-12 4774 238334 ns/op 41.96 MB/s
BenchmarkEncode/Newton/Level5/1e5-12 357 2974790 ns/op 33.62 MB/s
BenchmarkEncode/Newton/Level5/1e6-12 39 30595764 ns/op 32.68 MB/s
PASS
@escholtz I agree that having a (*Writer) Copy(f *File)
would be nice, but as you said, it isn't quite as flexible.
For comparison, copying a file with the patch I've submitted would look something like this:
func Copy(zw *zip.Writer, zf io.ReaderAt, f *zip.File) error {
offset, err := f.DataOffset()
if err != nil {
return err
}
w, err := zw.CreateHeaderRaw(&f.FileHeader)
if err != nil {
return err
}
_, err = io.Copy(w, io.NewSectionReader(zf, offset, int64(f.CompressedSize64)))
return err
}
Having Copy()
would be great, but if for some reason only one solution could be added, I'd prefer it to be something along the lines of CreateHeaderRaw
.
For the other use cases originally mentioned, I've written https://github.com/saracen/fastzip that takes advantage of CreateHeaderRaw
also (but I've had to vendor archive/zip
and include my modification for it to work).
Implemented in https://github.com/klauspost/compress/pull/214
I vaguely remember discussing this with @rogpeppe years ago, but I can't find the discussion. Any clearer memory, Roger?
My memory is coming back. I think what I remember @rogpeppe and I talking about was the ability to modify an existing zip file by appending new files to it.
@rsc https://groups.google.com/g/golang-dev/c/ZSubqlF2G4k?pli=1 this might be the discussion, and yeah, was related to appending.
Adding a whole new Create method just to set this extra bit seems unfortunate - what if there is a second bit we want control over later? Add four new methods? And so on.
Probably better to add a new bool in the CreateHeader. Perhaps WriteRaw bool
?
Probably better to add a new bool in the CreateHeader. Perhaps
WriteRaw bool
?
@rsc I originally preferred this too. The reason I didn't for my implementation is because FileHeader
is also used when reading an archive too. It felt odd having a WriteRaw bool
as part of that structure in the context of reads. Is this not an issue?
@saracen it seems like less of an issue than an exponential blowup of create methods. The NonUTF8 field is probably most analogous, although it does get set in the reader.
I suppose the most interesting analogy is whether there's a need to get the raw bytes from the reader as well. There at least we could add an OpenRaw method to zip.File. It's much harder to do that in what CreateHeader returns, because it's an io.Writer.
I suppose another way to store compressed files would be to register a no-op compressor.
Another way would be to have a method on zip.Writer: WriteRaw(bool) to set whether the compressors should be avoided. You'd z.WriteRaw(true) before calling CreateHeader.
Does any of these sound like a winner to anyone?
suppose the most interesting analogy is whether there's a need to get the raw bytes from the reader as well.
@rsc At the moment, I believe there's two ways to access the raw bytes:
ReaderAt
, and then use file.Offset
and file.CompressedSize64
to access and read a specific section of the archive.I suppose another way to store compressed files would be to register a no-op compressor.
This sounds quite good, especially since it's similar to how you can handle raw reads by registering a decompressor.
Using a zip.NoOpCompressor
would need to mean that the CompressedSize64
and CRC32
fields of the header are left unmodified when the header is closed:
zw := zip.NewWriter(archive)
zw.RegisterCompressor(zip.Deflate, zip.NoOpCompressor)
hdr := &zip.FileHeader{
Name: "file.txt",
// the following fields would be left untouched because
// the writer detects that zip.NoOpCompressor is being used
CRC32: 0xffffffff,
CompressedSize64: 1024,
}
w, _ := zw.CreateHeader(hdr) (io.Writer, error)
// bytes are copied to the zip archive unmodified
io.Copy(w, buf)
The problem with the no-op compressor is that the uncompressed size would be wrong. That would have to be left alone. It's probably a little too subtle.
Given that you can already read the compressed data directly, as @saracen points out, maybe it's fine to focus on writing.
What if pre-filling CompressedSize64 (> 0) in the FileHeader means that the writes are expected to be pre-compressed? In that case CRC32 has to be filled in too, although zero is a valid CRC32 so we can't double-check that. But I believe 0 is not a valid CompressedSize64, and in any event it's not an interesting one.
In essence, CompressedSize64 = 0 means "I don't know, you compress it for me".
Edit: I see that this was @saracen's suggestion 3 in the original comment at the top.
@saracen, @escholtz, @klauspost, any thoughts about the most recent suggestion (previous comment)?
According RFC1951, the compression stream requires at least one block (with the BFINAL bit set). Therefore, it is impossible to have a compressed DEFLATE stream that is zero-bytes in length.
However, the ZIP file format allows for a number of different compression methods to be used (DEFLATE happens to be the most common by far). I think we should think about 1) whether we want to support other compression methods, and 2) whether we ever want to support other compression format that may theoretically have a zero-length output for an empty input.
I've been hesitant to comment because I didn't want to hijack @saracen's original proposal.
For my use case, the func (w *Writer) Copy(f *File) error
implementation would be sufficient. I think it fits cleanly with the existing package exports and still maintains simplicity.
I agree that having both CreateHeader
and CreateHeaderRaw
does not feel like a clean standard library package.
Is there precedent in the standard library for using the CompressedSize64 approach?
For my use case, func (w *Writer) Copy(f *File) error
doesn't help, but I can see its usefulness and how it relates this this problem.
@rsc In my original proposal, I'd written:
- Use the existing CreateHeader(fh *FileHeader) (io.Writer, error) function, but if CompressedSize64 has already been set, assume that data written is already compressed.
I'm going to assume that option 3 would be a no-go, because existing code might suddenly break if a user has already set the CompressedSize for whatever reason, but hopefully the other options are viable.
Breaking existing code was my only reluctance to this approach. If we don't see that as an issue, it sounds like an easy way to support this.
@saracen I took another look at the three examples you originally provided and I think they could all be achieved with a Copy method?
- Repackaging a zip file, removing or adding files, without incurring the associated compression overheads for files that already exist (somewhat achieving #15626)
Using existing zip file(s), don't Copy files you want to remove. Copy existing files you want to keep. Use existing methods to add new files.
- Compress first and include later based on whether the compressed size was smaller than the original, without having to perform the compression twice.
Write a new zip file, read the compressed size, only Copy if it's smaller.
- Support concurrent compression: compress many files concurrently and then copy the already compressed files to the archive (similar to Apache Commons Compress' ParallelScatterZipCreator)
Write multiple zip files concurrently and then Copy files from each into a single zip file.
Perhaps this approach isn't as elegant and has slightly more garbage/performance overhead but it avoids the tax of repeatedly compressing the same file which I would expect is the bottleneck for most cases.
Breaking existing code was my only reluctance to this approach. If we don't see that as an issue, it sounds like an easy way to support this.
For all Go code inside Google, I only found one instance of someone implicitly setting CompressedSize. The usage pattern looked like the following:
rc, ... := rf.Open()
fileHeader := rf.FileHeader
wc, ... := w.CreateHeader(&fileHeader)
... io.Copy(wc, rc)
Essentially the code is copying one Zip file to another.
If we go with the semantics that a positive CompressedSize means to interpret the input as already compressed, then the pattern above will break since io.Copy
will read uncompressed data from rc
and write it to wc
when the latter is expected compressed data.
We could "fix" this by adding a ReadFrom
or WriteTo
method that specially handles copying of compressed data from one zip.File
to another. However, that approach seems a bit too magical and I can see other edge cases where it fails.
@dsnet, I don't think we have to fix this, but we could at least detect the problem: if the alg is deflate and we expected compressed data, we can at least check that the first few bytes are a proper deflate header. That would make the io.Copy return an error. But it is a little concerning.
proper deflate header.
Unfortunately DEFLATE doesn't really have any distinguishable magic numbers unlike GZIP, which typically has a 10-byte sequence that is readily identifiable. It seems that the best we could do is decompress some number of bytes and assume the input is valid if it passes that short test.
It sounds like I should look in the Go corpus I have for other code using this pattern. I'll try to do that for next week. Maybe overloading the CompressedSize here is too subtle.
Will try for next time.
It feels like we're loosing interest in this proposal and should draw a conclusion.
I don't think overloading CompressedSize is a good approach.
I still like the Copy approach that @rsc chose to use in the zipmerge package. I think all 3 use cases in the original proposal can be solved with a Copy method.
@escholtz, I haven't yet done the scan, but it doesn't seem like it's necessary - it's a plausible example that we'd break.
Re https://github.com/golang/go/issues/34974#issuecomment-659518287, the one thing that would still be good to support would be if you have the header bits in hand along with the compressed form, but you don't have them in an already-existing zip file. (Maybe you want more control over the compression, or you have DEFLATE or an alternate compressed form already from a different archive file, or something like that.)
After rereading this thread now with a bit more distance, I think the fundamental operation is really just access to raw data in both reading and writing. I suggest we add those directly, along with the Copy helper:
// like CreateHeader but the data written to the io.Writer should be precompressed
func (*Writer) CreateRaw(file *FileHeader) (io.Writer, error)
// like Open but the data read back is the compressed form
func (*File) OpenRaw() (io.Reader, error)
// helper - does f.OpenRaw, w.CreateRaw, io.Copy
func (w *Writer) Copy(f *File) error
That sounds reasonable. Looking at the package, I assumed this type of access was intentionally not exported for simplicity and reliability. If we're ok with making the package a little less simple, this would enable the advanced use cases we're trying to achieve.
Hi everybody! :smile: I was thinking about that for couple of months. After long thinking I created my solution. I was dreaming to put my ideas in live when I found out this issue :laughing: So far I had the same 2 problems in mind
I had different iterations in my mind. First there was something like @saracen mentioned with the similar looking ReadCompres method. It was dirty and monstrous. So far after long thinking I created an ABSOLUTE copy of Copy
method from here :laughing: :laughing: :laughing: I even did the same optimization with writeDesc
func. But I had my genius moment :smile: when I discovered that both tasks could be done only with Copy
method.
I am happy that I still can add my penny here and probably resolve this discussion about blowing up methods.
We don't need CreateRaw(file *FileHeader)
method. And probably OpenRaw() (io.Reader, error)
also.
It's easy resolved only with Copy
Here is the idea
func main() {
// final writer
zw := zip.NewWriter(bytes.NewBuffer([]byte{}))
// get precompressed zip.File
prFile := Precompress(data)
// copy precompressed zip.File
zw.Copy(prFile)
}
func Precompress(b []byte) zip.File {
// create tmp zip writer
buf := bytes.NewBuffer([]byte{})
zw := zip.NewWriter(buf)
// write/compress data in
zwf, _ := zw.Create("test")
zwf.Write(b)
// extract file from tmp zip
zr,_ := zip.NewReader(readerAt(buf.Bytes()), int64(buf.Len()))
return zr.File[0]
}
In this way all compression, descriptor validation logic is still encapsulated(!). And there are no overheads (I think) . We compress data to buffer then we do io.Copy to the destination reader. It will be the same in case of compressing it directly.
In case if user want to copy couple of files from big zip and keep them in memory. It's again can be done with Copy
. User copy needed files to the tmp zip and then they needed it copy it to destination zip.
I didn't think about the OpenRaw()
method before. There are no much use cases so far. If user need to keep only 1 file in compressed state then them can use tmp zips. Use case can be when an user want to extract raw file and send it directly over http to the client :thinking: . In my first iterations I achieved it through Compressor
interface. I copied data to the buffer and ignored error from CRC2 check. It worked pretty well for long time. I will post code here when I find it.
Conclusion
Copy
method is criticalI suggest we add those directly, along with the Copy helper:
@rsc
OpenRaw
, as mentioned in https://github.com/golang/go/issues/34974#issuecomment-648982928 there's already a few ways to access the raw data. Although, OpenRaw
is probably clearer.CreateHeaderRaw
over CreateRaw
, I think it's clearer that it's a variant of CreateHeader
.Copy
should also be added 👍 @klauspost's compress
library added both CreateHeaderRaw
and Copy
. Obviously, this is an external library, but it would be nice if the API continued to match.
@im7mortal Using just Copy
does work and @escholtz suggested this too. It's a good solution for many problems, but it relies on either your precompressed data being in a zip file or you first copying it to one. Copying precompressed data from a different archive format would require a temporary copy to a new zip file, only to be read back again.
Personally, I'd prefer CreateHeaderRaw over CreateRaw, I think it's clearer that it's a variant of CreateHeader.
It seems unnecessarily long. There's no way to have CreateRaw that doesn't also take a header.
If we plan on adding a CreateHeaderRaw
, I suspect that we'll need to pay close attention to #22520. We'll want to make sure we get the semantics right from the start, otherwise they will be very hard to change later on.
Thanks for the pointer. I don't see why they are necessarily related. CreateRaw should do the same as CreateHeader, just preparing for raw writes. It shouldn't mix in other different semantics.
For that issue, I commented there but I think it would be fine to add a new field that CreateHeader (and then by extension CreateRaw) would use. I think that can be an independent decision though.
@saracen
... but it relies on either your precompressed data being in a zip file or you first copying it to one. Copying precompressed data from a different archive format would require a temporary copy to a new zip file, only to be read back again.
It's not avoidable in both cases. It's just the same but CreateHeaderRaw
discards all incapsulated logic. Instead an user must implement it themselves.
Precompression with CreateHeaderRaw
zip.CreateHeaderRaw
. Ensure crc32 and precompressed size is correctly set :exclamation:func preompress(b []byte) (check uint32, c []byte, err error) {
wsumm := crc32.NewIEEE() // an user must ensure correct check sum
_, err = wsumm.Write(b)
if err != nil {
return
}
buff := bytes.NewBuffer([]byte{})
wcmp, err := flate.NewWriter(buff, flate.DefaultCompression) // MUST MATCH with actual compression method
if err != nil {
return
}
_, err = wcmp.Write(b)
if err != nil {
return
}
return wsumm.Sum32(), buff.Bytes(), err
}
...
w := zip.NewWriter(writer)
crc32_, compressed, err := preompress(data)
if err != nil {
return
}
wf, err := zip.CreateHeaderRaw(zip.FileHeader{
Name: "tmp",
CRC32: crc32_, // an user must ensure correct check sum
// an user must ensure correct uncompressed size
// it requires additional logic in case of the pipe writer. CRC32 also
UncompressedSize64: uint64(len(data)),
Method: zip.Deflate, // MUST MATCH with actual compression method
})
if err != nil {
return
}
_, err = wf.Write(compressed)
if err != nil {
return
}
...
:x: Need wrapper
:x: Additional Write-Read-Write
:x: Unencapsulated. User must implement crc32
sum and propagate it with length correctly
:x: Unencapsulated. of the zip.Decompressor is unused. User can use different method from which set in the header.
:x: Complex wrapper in case of the pipe writer
:x: Allow have corrupted file in the archive :warning: Can cause crash of receiver zip libraries :exclamation:
Precompress through tmp zip
zipWriter.Copy
. Which do io.Copy
In both cases there is need to write wrappers.
func PrecompressInZip(b []byte, name string, method uint16) (zf zip.File, err error) {
// create tmp zip writer
buf := bytes.NewBuffer([]byte{})
zw := zip.NewWriter(buf)
// write/compress data in
zwf, err := zw.CreateHeader(&zip.FileHeader{
Name: name,
Method: method,
})
if err != nil {
return
}
_, err = zwf.Write(b)
if err != nil {
return
}
// extract file from tmp zip
zr, err := zip.NewReader(readerAt(buf.Bytes()), int64(buf.Len()))
return zr.File[0], err
}
...
// final writer
zw := zip.NewWriter(bytes.NewBuffer([]byte{}))
// get precompressed zip.File
prFile, err := Precompress(data, zip.Deflate)
if err != nil {
return
}
// copy precompressed zip.File
zw.Copy(prFile)
...
:x: Need wrapper
:x: Additional Write-Read-Write
:warning: add directoryHeaderLen=46
bytes of metadata (it vary but very small)
:green_book: CRC32
summ and length are encapsulated and ensured by very good logic
:green_book: Unencapsulation of the zip.Decompressor
is used fully. The writer use only corresponding compressor. No way to mix up!
:green_book: Pipe wrappers do not require additional logic and can be handled through std libs
:green_book: Don't allow have corrupted file in the archive.
Conclusion
@rsc The CreateHeaderRaw
is very error prone. All encapsulated logic from zip library must be reimplemented by user.
zip.Copy
from temporary zip. It require the same Write-Read-Write sequence.I agree with @im7mortal on this.
I think most gophers using the archive/zip package are looking for simplicity. They don't care about the specifics of the zip format, they just want to be able to read and write files.
A much smaller number of gophers are looking for a zip package that exports all of the guts that archive/zip currently hides. Building blocks to make it easier to build a custom / optimized zip package.
I think the standard library should lean more towards the simple case and not the building blocks case. I worry if we try to do both, we end up with a muddled package that is more complicated and confusing for the typical consumer.
I still don't know if Copy should be added (even though I personally want it so I don't have to maintain a fork). On the one hand, it maintains safety and doesn't add too much complexity. On the other hand, it's only necessary to support less common use cases. And it still doesn't support advanced use cases (copying from other archive formats, although I would suspect this is even more rare than copying between zips).
@im7mortal
It's not avoidable in both cases. It's just the same but CreateHeaderRaw discards all incapsulated logic.
There are cases where this is avoidable. You can have compressed data, know the uncompressed size and either already have the crc32 checksum of the uncompressed data or calculating it is still much faster than decompressing and recompressing.
The compressed data doesn't even need to be DEFLATE
. This package already supports vendor compression methods, and there's certainly use-cases where I might want to move pre-existing compressed data (say zstd
) into the zip structure, over another archive format.
The whole point of CreateRaw
is to give more control over this and to go around the safety offered by the existing functionality. The zip
package already allows for this to a certain degree, and it's already possible to create corrupt/invalid archives accidentally.
@escholtz Regarding simplicity, this is difficult to not agree with.
Supporting vendor compression methods and arbitrary extra
data are also features that are not going to be used by everybody, but they've been added. I personally don't think that CreateRaw
is going to confuse users or take away from the simplicitiy of the more common functions provided, but I understand your point.
@saracen
There are cases where this is avoidable. You can have compressed data, know the uncompressed size and either already have the crc32 checksum of the uncompressed data or calculating it is still much faster than decompressing and recompressing.
... into the zip structure, over another archive format.
I am agree that there are cases. I think we could consult with #15626 (Delete
, Append
, Replace
methods).
The most common use cases parallel compress
, precompress
, copy between zips
can use Copy with negligible overhead.
@im7mortal, you are right that much can be done with Copy. That's why I included it in the suggestion in https://github.com/golang/go/issues/34974#issuecomment-701705168 to add CreateRaw, OpenRaw, and Copy (all three), even though technically Copy can be implemented in terms of the other two.
In contrast, dropping the other two makes the use cases that need that low-level access fairly awkward and non-obvious.
It doesn't seem harmful to have these two important building blocks alongside the higher-level helper.
Does anyone object to adding CreateRaw, OpenRaw, and Copy?
Based on the discussion above, this seems like a likely accept. This is subtle, though, so it would make sense to put off until Go 1.17 at this point.
No change in consensus, so accepted.
Maybe I missed this, but why not just open the file directly instead of using zip.Open if you want to write/read bytes directly?
@hherman1 This proposal is about working with compressed files that are contained within zip archives. It's not about working with the zip archive itself.
Change https://golang.org/cl/312310 mentions this issue: archive/zip: add File.OpenRaw, Writer.CreateRaw, Writer.Copy
When implementing this in https://golang.org/cl/312310 one challenge I discovered was figuring out how we should handle the trailing data descriptor (https://en.wikipedia.org/wiki/ZIP_(file_format)#Data_descriptor) for the OpenRaw and CreateRaw methods. @ianlancetaylor also raised concern in the review for good reason.
I chose to return the data descriptor in the OpenRaw reader and give the CreateRaw writer responsibility for calculating and appending the data descriptor when necessary. This seemed optimal in terms of giving the caller full control, which is the intention of this change, while also conforming to the interface we agreed to. And it also ensures the data descriptor isn't lost by the Copy method.
Perhaps I'm overlooking a different/better approach or perhaps we want to change the interface to allow the same data descriptor reading/writing functionality but in a safer way.
What are your thoughts?
For me, OpenRaw
/CreateRaw
was mostly about providing raw access to the file data and not modifying the fields that I provide in the header on my behalf. I still expected it to serialize the header data in the same way.
I don't know whether it's needed, but I understand the reason for adding it.
The data descriptor is part of the spec. We need to support it for the raw methods. The CRC32 and uncompressed size fields are not things we can compute within the raw methods. So we need to allow the caller to specify them.
The 3 data descriptor fields are also in the FileHeader but I don't think it's a good idea to overload them by copying the data descriptor fields into the FileHeader. Since it's a separate segment it should be treated as such.
The only other option I can think of is adding a DataDescriptor struct within the archive/zip package that handles marshalling and unmarshalling of the byte segment. But since the Raw methods are intended for advanced/rare use cases, I'm fine with allowing the caller to handle the marshalling and unmarshalling.
@rsc Are you ok with this approach (returning the data descriptor bytes from OpenRaw and requiring the caller to write the data descriptor bytes to the CreateRaw Writer)?
I think it's going to be error-prone to attach the data descriptor bytes to the reader. I suggest:
// DataDescriptor holds the data descriptor that optionally follows the file contents in the zip file.
type DataDescriptor struct {
CRC32 uint32
CompressedSize uint64
UncompressedSize uint64
}
// OpenRaw is like Open but reads the raw, compressed, data from the file.
// If dataDescriptor is not nil, it contains the contents of the optional data descriptor following the file contents.
func (*File) OpenRaw() (r io.Reader, dataDescriptor *DataDescriptor, err error)
// CreateRaw is like CreateHeader but the data written to the io.Writer should already be compressed.
// If dataDescriptor is not nil, it will be written as the optional data descriptor following the file contents.
func (*Writer) CreateRaw(file *FileHeader, dataDescriptor *DataDescriptor) (io.Writer, error)
@dsnet Any thoughts?
The 3 data descriptor fields are also in the FileHeader but I don't think it's a good idea to overload them by copying the data descriptor fields into the FileHeader. Since it's a separate segment it should be treated as such.
@escholtz We serialize the known crc32, uncompressed and compressed size fields from FileHeader
when we write the local file header. What benefit is there to not doing the same in the data descriptor? Would they ever be different?
For the CreateRaw case, if the FileHeader Data Descriptor 0x8 Flag is not set, but a non-nil dataDescriptor argument is passed, would we treat that as an error?
And if the CRC32, CompressedSize, and UncompressedSize fields are non-zero in the FileHeader and a non-nil dataDescriptor argument is passed, would we treat that as an error?
If you have a compressed file, know its uncompressed size and crc32 checksum, it'd be nice to be able to add it to an archive as is.
I have three current use-cases for this:
I see three different ways we could achieve this:
zip.CreateHeaderRaw(fh *FileHeader) (io.Writer, error)
function, that uses the FileHeader'sCRC32
andUncompressedSize64
fields set by the user.CreateHeader(fh *FileHeader) (io.Writer, error)
function, but have a new FileHeader field that indicates we're going to write already compressed data (and then useCRC32
andUncompressedSize64
fields set by the user)CreateHeader(fh *FileHeader) (io.Writer, error)
function, but ifCompressedSize64
has already been set, assume that data written is already compressed.I'm going to assume that option 3 would be a no-go, because existing code might suddenly break if a user has already set the CompressedSize for whatever reason, but hopefully the other options are viable.