cenotelie / hime

Apache License 2.0
27 stars 4 forks source link

[Rust] Automaton File Compression support #85

Open stevefan1999-personal opened 11 months ago

stevefan1999-personal commented 11 months ago

We can leverage SOF3/include-flate: A variant of include_bytes!/include_str! with compile-time deflation and runtime lazy inflation (github.com) for this. 

From my preliminary test, the C# grammar can achieve more than 90% compression ratio (7.57MB to 230KB), at the cost of larger runtime memory allocation.

I'm not sure if this could be a new research direction haha but I'm curious what the automatons now looks like after trie compression and huffman coding

stevefan1999-personal commented 11 months ago

Yeeeeeeeeeepppppppp...savings are pretty huge

stevefan1999-personal commented 11 months ago

I think I realized the reason why. 

Since we are using a lot of u32s, but we can't definitely use this much number of states. 

So simply speaking there are a bunch of sparsely-spanned zero bits, and text compression exactly like this kind of pattern!

woutersl commented 11 months ago

Wow, this looks great. I'll try this out. The advantage is that the compression is done at compile-time of the generated code so this does not require support in .Net and Java.

SOF3 commented 11 months ago

just curious, was it really a well thought decision to use include_flate? while it significantly reduces the size of the static files, both the compressed data and the decompressed data (lazily allocated) remain in process memory without dropping. is it really that meaningful to produce a small binary but large runtime memory?

woutersl commented 11 months ago

To clarify a little bit, this feature is not there yet. In addition, it will be gated behind a flag and disabled by default so the current behavior does not change, but users that care about binary size (at the detriment of runtime memory indeed) can take advantage of it.