ebiggers / libdeflate

Heavily optimized library for DEFLATE/zlib/gzip compression and decompression
MIT License
1.02k stars 170 forks source link

How about a paired web minifier? #65

Open JoeUX opened 4 years ago

JoeUX commented 4 years ago

Enhancement request/idea: Consider writing a minifier to pair with libdeflate, since a major use case of libdeflate is probably to compress web content – HTML, JS, and CSS files.

It might be possible to leverage libdeflate's parsing logic toward a web minifier's parser. Relatedly, if you knew that the input to the gzip compressor was minified HTML, CSS, and JS, could you accelerate the compression? Maybe the parser or match finder? Or, what if you knew that you weren't going to have any matches longer than say 40 bytes?

Some of this might be easier if you knew that the input wasn't just minified web content, but more specifically was content that you minified, according to your minifier rules, standards, or spec. So it would have certain features or patterns. Line endings would be normalized, there'd never be certain forms of whitespace in code areas, etc. We could even have metadata for these HTML, JS, and CSS files that reported the length of the file, the longest length repeated string, the max number of repeats, etc. Could that metadata significantly help libdeflate?

I'm not aware of a minifier written in C. It would probably be the fastest minifier on earth, by a wide margin, especially if you used the SIMD in parsing and matching that you use in libdeflate, and it would be an interesting and popular project on its own. I'm not good with C, but I could write a spec for the minifier and/or the minifier output. Some minifiers are unsafe and break websites, so it would be nice to have one that actually had a spec, and was safe.

HansBrende commented 1 year ago

@ebiggers I am also curious to know if there is a possible performance enhancement if you knew in advance that there are no matches longer than some configurable number of bytes (use-case: I'm compressing streams of JSON objects, and each object contains a UUID. Therefore, no match could possibly be longer than the maximum distance between UUIDs).

ebiggers commented 1 year ago

I am also curious to know if there is a possible performance enhancement if you knew in advance that there are no matches longer than some configurable number of bytes

In principle, sure. In practice, it wouldn't be much and would be hard to implement, considering that the limit presumably would need to be dynamic and not known at compilation time.