PRJosh / lz4

Automatically exported from code.google.com/p/lz4
0 stars 0 forks source link

Streaming format enhancements #123

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
1.4 of the LZ4 Streaming format provides a sound stream interface, yet the 
following enhancements would provide more flexiblity for both client and server.

Possible use of 'header' reserved bit and/or new version, when set (1) enables 
an extended block format, allowing

 a) optional block independence/
     permits server to dynamically enable/disable block independence.

 b) decompression size/
     permits client memory optimisations since the decompressed size
     would be known in advance.

Extend block format

 [block-size] <data> [source-size] [crc]

o source-size

  Only present if the associated header flag is set.

  This field uses 4-bytes, format is little-endian.

  The block-dependent bit permits the server to dynamically select the
  block-dependent/block-independent mode.

  - The highest bit is “1” if the data is dependent on 
       previous; upon change to “1”, the prefix buffer is assumed 
       to be the related to the previous dependent frame (in any).
  - The highest bit is “0” if the data in the block is compressed by LZ4.

  All other bits give the decompressed size, in bytes. The reported 
  size shall never be larger than 'Block Maximum Size'.

  Note, this field is included within the optional block crc if present.

Original issue reported on code.google.com by ada...@gmail.com on 21 Mar 2014 at 9:14

GoogleCodeExporter commented 9 years ago
There are good proposals
And indeed, current version of the streaming format allows such extension.

- Regarding custom decompression size :

I guess you are interested into decompressed size of a single block, as opposed 
to the full size of the stream, which is already covered by the spec.
I've been considering this evolution for quite some time, but lacked a concrete 
use case to move it forward. 
Typically, the "block size Id" value '0' could be used for that.
A minor point is that the decompressed block size must have a hard limit, to 
avoid side-effects, and I was considering 8 MB as such hard limit.
I was expecting to place the [source-size] field at the beginning, rather than 
the end, of the block, since I don't see what could be the benefit of having it 
placed at the end.

So yes, we can discuss the details, the main idea is that it is achievable, 
although I would like to get a scenario in which this flexibility is useful.

- Regarding optional block independence :

It reminds me that Mark Adler suggested something similar, which he called 
"super-blocks", where the blocks would be dependent only while within the same 
super-block.
Your suggestion basically encompasses this use case.

This suggestion will require the use of a reserved bit. It's not pretty but I 
don't see a better way.
It will also reserve a 2nd bit within [block-size].
It looks like your suggestion is to use the upper bit of [source-size] instead, 
but that would make 'optional block independence' totally reliant on 'custom 
decompression size', which seems unnecessary : both ideas look useful on their 
own.

Reserving a 2nd bit within [block-size] is unlikely to cause any problem, since 
[block-size] <= [source-size] <= 8 MB, which only uses 24 bits. So we basically 
have 8 bits to transfer any kind of message.

In conclusion, this feature seems also completely achievable.
The question though is the same : is there a scenario in which this extended 
flexibility is useful ?
The only one I could find seems quite limited (use very small block size for 
very small messages on low-capacity CPU/Memory configurations). But that could 
be a good enough reason nonetheless.

Original comment by yann.col...@gmail.com on 21 Mar 2014 at 6:31

GoogleCodeExporter commented 9 years ago
Hi.
I think there is a benefit.

Optional block independence allows to achieve parallel block decompression.
When decompressor read a independent block, there is a chance to create new 
task (thread).

But I'm not sure about real performance gain.

Original comment by takayuki...@gmail.com on 30 Mar 2014 at 1:57

GoogleCodeExporter commented 9 years ago
Yes, indeed, that's one possible benefit, 
although it's really difficult to properly take advantage of it if the 
"independence" flag is not regular, 
in such case, real-world performance gains might be difficult to perceive.

Original comment by yann.col...@gmail.com on 31 Mar 2014 at 11:27

GoogleCodeExporter commented 9 years ago

Original comment by yann.col...@gmail.com on 22 Apr 2014 at 11:07