[Q&A] InflateState reset has measurable costs when decompressing millions of objects

Byron commented 3 years ago

When decompressing the objects in the linux kernel pack with 7.5 million objects/decompressions, I am using a boxed InflateState which is reset for each object to get ready for the next decompression. Please note that this is the common case, as it happens when receiving a pack when cloning or fetching, as an index has to be built from the pack. gitoxide is already faster than git when resolving these objects (i.e. computing the SHA1), but is quite a bit slower during indexing, which I am trying to fix :D.

Since it comes with its own output buffer, this one is reset as well.

This shows up when profiling decompressing/indexing the entire kernel pack:

Commenting out the line that resets the buffer has the desired effect, removing this cost entirely.

The respective field has a TODO associated with it, and I wonder if you think my issue could be fixed with that.

Thanks for sharing your thoughts.

oyvindln commented 3 years ago

Hm, re-using the buffer without zeroing it could in theory pose some security risk if there was a bug somewhere in the decompression code (not that we have found anything like that through fuzzing.) It's probably possible to use a structure or something to statically guarantee not accidentally reading old content though.

A simple temporary solution would be to add an extra reset function that doesn't zero out the buffer for uses like this, with an accompanied warning. In any case the streaming api needs a serious rework.

It is also possible that the compiler isn't optimizing the code to zero the array well and calling bzero more than it needs to, but I haven't looked into what code the line generates.

Byron commented 3 years ago

A simple temporary solution would be to add an extra reset function that doesn't zero out the buffer for uses like this, with an accompanied warning. In any case the streaming api needs a serious rework.

I agree, that seems like the easiest course of action, and I am happy to submit a PR for that if you give me the go along with a name for that method :D.

On another note, do you from the top of your head could think of other overhead/cost associated with starting to decompress a stream? I had the feeling that even with that issue removed, gitoxide could only maintain about 25MB/s when decompressing many, many small streams, whereas git keep up at ~50MB/s.

In any case, thanks a lot for your help!

Byron commented 3 years ago

I took a closer look at this issue and did my best to assure the slowdown at the end of the pack (where there are many, many small streams) is not caused by something outside of actually decompressing the streams.

To achieve that, I split the producing code that is decompressing small objects into its own thread and avoided making any allocation there. The outcome (an entry) is sent in chunks big enough to minimize synchronization overhead to a consumer thread, which integrates that into a tree. That way, it's easy to see if for instance the consumer is the bottleneck. Fortunately, one core can integrate millions of entries, and I have seen it receive peaks of close to 400k entries per second.

I have also applied the pack in the linked PR to miniz oxide to shave off several seconds.

Below is how this looks like after profiling decompressing the linux kernel pack.

Most time is spent in the producer thread decompressing entries, some time is spent on IO as well, which is expected.

Now I get these times which are quite respectable: 52.1MB/s vs 53.7MB/s for by git itself.

Once we get the PR to a mergable state an a new patch release, I think this issue can be closed as well - it's good enough :).

Frommi / miniz_oxide

[Q&A] InflateState reset has measurable costs when decompressing millions of objects #89