chenall / grub4dos

外部命令和工具源码:https://github.com/chenall/grubutils 下载:
http://grub4dos.chenall.net
GNU General Public License v2.0
643 stars 136 forks source link

Why LZ4 decompression not being considered? #40

Open Oxtie opened 9 years ago

Oxtie commented 9 years ago

is there no one who can look into replacing lzma with LZ4?

EDIT: LZ4 is now implemented... :100:

chenall commented 9 years ago

LZMA is write by @karyonix,I'm not familiar with these.

Oxtie commented 9 years ago

so,... LZ4 will never get implemented? by anyone else? i guess only Yann Collet can do this now,.. since karyonix has long abandoned grub4dos.

http://fastcompression.blogspot.com/2011/05/lz4-explained.html http://fastcompression.blogspot.com/p/lz4.html

Oxtie commented 9 years ago

Chenall, can you check your email?

karyonix commented 9 years ago

LZ4 speed is interesting. But LZ4 frame format does not have uncompressed data size in its header. It will be difficult to allocate memory properly. You may have to decompress in 2 passes. The first pass is just to calculate the size before allocation. This will lower decompression speed. Alternatively, you may define a new container file format which contains uncompressed size information. But you will have to provide a tool for users to convert LZ4 file to the new container file format.

Cyan4973 commented 9 years ago

Interesting. Uncompressed data size is a recurring request for the LZ4 frame format, so it could be a good idea to tackle this one.

Uncompressed data size is a proposed, but unimplemented, feature of the LZ4 frame format. See : https://docs.google.com/document/d/1Tdxmn5_2e5p1y4PtXkatLndWVb0R8QARJFe6JI4Keuo/edit page 7, "Content Size"

The reason for the lack of such feature into current library are as follows :

1) General lack of interest

Most framing format users are either looking for a file or a stream compression scenario. In both cases, uncompressed size is irrelevant, either because it's not known and will be discovered at the end (sometimes, way after decompression already started at the other end of the communication chanel), or because it doesn't matter, the file being potentially very big, so what we want is to bound memory usage, hence we only tag the size of a single "block".

2) More complex streaming process

It's a little thing, but streaming is slightly complexified by variable size headers, which would be the case when implementing content size. More importantly, this complexity cannot be completely hidden from the user. So it's not an impossible feat, it's just that adding complexity is something I frown upon, and only consider when it adds worthwhile features.

3) Target size

The initial objective when the specification was redacted was file compression. The "content size" was guessed to be arbitrarily long. Hence, as a simplification, it was defined as an 8-bytes length field.

Afterwards, a few request for this feature came in, insisting that its use case was rather for allocation of small memory blocks. As a consequence, 8-bytes looked overblown.

This is main reason why the feature remained implemented : it seemed it was designed with the wrong goal in mind, and would better deserve a new specification round.

4) Encoding

The way I see it today is that it's likely a good idea to allow 4 different sizes : for example, 0 (no size provided), 2, 4 and 8 bytes. This would allow field length selection through a 2-bits flag. Another interesting proposition could be 0,1,2,8 (would benefit more very small (<256) packets).

This idea introduce a small (surmontable) complexity : a 2-bits flag would be better read as a 0-3 value. But unfortunately, currently, 0-8 is decided using bit 3 of FLG header byte, and there is no room around it. The better follow up would be to use the last remaining bit of FLG header, which is bit 1. Nothing horrible, but now the decoding of "content size field length" becomes a bit more complex :

bit3 bit1 length
0 0 0
0 1 2 (or 1)
1 1 4 (or 2)
1 0 8

It looks a bit complex, but the main property of such a scheme is that it would remain compatible with current format.

Fortunately, this complexity could remain hidden from users, directly taken in charge by lz4frame API. The cost will be for users which manually encode/decode the frame header. It's not a good thing, but it looks manageable.

Well, that's my current thought about this requirement. If you feel it would be interesting for your use case, please drop me a word. Implementing a new feature is only worthwhile if it serves some purpose.

Oxtie commented 9 years ago

For a frequent FiraDisk user like me, decompression speed of "used space" and "empty space" in a compressed, huge image, really matters. For example, mapping an image of 12GiB, (where a used space is only 4GiB and rest all free) is a painful process with existing supported formats of grub4dos.

If my guess is right, even with all the LZ4 complex encoding,.. decoding speed surely serves my purpose (even with half decoding speed (due to complexities), its still looks better then gzip's or lzma's speed).

From my perspective, Both of you people have done amazing work. My system is extremely stable and quick with FiraDisk and LZ4 has been looking very very interesting. It is amazing to imagine lightning fast boot-ups

Power Button > Grub4dos > Mapping LZ4'ed image > FiraDisk > OS > Happy User :) :+1:

Cyan4973 commented 9 years ago

OK, so it seems that, in your case, "uncompressed size" is expected to be very large ?

Oxtie commented 9 years ago

Yes, large and also very large.

I have many images, one with 10 GiB (3GiB free), one with 4GiB (2.5GiB free), one with 2GiB (0.7GiB free). They all are created with different purposes in mind and they all are expected to get bigger in size. Free GiBs eventually get less with new apps and updates while some free GiBs in a system also serve quick file manipulations :)

Without compression, mapping 3GiB+ image is very slow on a standard HDD and USB2, Without compression, mapping 7GiB+ image is very slow on a SATA SSD

gzip decodes "empty space" fast but decodes "used space" like tortoise, (very very very slow) and if the image contents are fragmented, I see a "gear shifting" effect (starts slow then becomes fast, then again slow and fast according to fragmented content's location).

I am guessing with your Lz4 decoding, there will be some kind of cool "pressure springing burst" effect? I read some of your comments related to "SSE implementation", those MB/s numbers looked great ( http://fastcompression.blogspot.com/2011/05/lz4-explained.html ) Its beautiful to imagine those numbers with grub4dos :)

Cyan4973 commented 9 years ago

Latest "dev" version of LZ4 now supports frame content size (original size). https://github.com/Cyan4973/lz4/tree/dev

The way it works : There is a new field within LZ4F_frameInfo_t, called : unsigned long long contentSize; /* Size of uncompressed (original) content ; 0 == unknown */

It's using some free space kept available into LZ4F_frameInfo_t structure, and is therefore backward compatible.

When the value of this field is zero, the behavior is unchanged : original size is not stored into compressed frame.

When the value is set to anything >0, it is accepted as the future total input size while calling LZ4F_compressBegin(). This value will then be checked on successive calls to LZ4F_compressUpdate(), and verified when calling LZ4F_compressEnd(). If it's incorrect, an error code will be issued.

Oxtie commented 9 years ago

edit: Either developers like to surprise or they like to hibernate? indefinitely??

chenall commented 9 years ago

I hope someone can do it!

Cyan4973 commented 9 years ago

I stated in the past, I can offer my support for LZ4, but can't really help on grub4dos itself. Such patch will have to be produced by a member of the grub4dos community, with enough knowledge of its code base.

chenall commented 9 years ago

thanks @karyonix this version seems supported lz4.

http://reboot.pro/topic/20518-extract-dynamic-vhd-to-ram/

karyonix commented 9 years ago

It can decompress LZ4 with frame content size. But it is still not useful because of its low speed. I need some more work on optimization. This version can load dynamic VHD faster than LZ4. @chenall It may have problem mapping fixed VHD directly. I have not tested that.

chenall commented 9 years ago

@karyonix 谢谢,期待.

can i add this function into branch 0.4.6a with your code?

karyonix commented 9 years ago

Yes, if you think it is appropriate. But I think my dec_vhd.c line 173 need some edit.

if (diskType != VHD_DISKTYPE_FIXED && diskType != VHD_DISKTYPE_DYNAMIC) { /* Differencing disk and unknown diskType are not supported */ // grub_printf("diskType %d not supported\n", diskType); return 0; }

VHD_DISKTYPE_FIXED should return 0 too.

chenall commented 9 years ago

Change it to

if (diskType != VHD_DISKTYPE_DYNAMIC) {
/* Differencing disk and unknown diskType are not supported */
// grub_printf("diskType %d not supported\n", diskType);
return 0;
}
Oxtie commented 9 years ago

Niceeeee! Can't wait to imagine trying out an optimized version...

btw, I didn't understand half of the things in the following link (that talks about optimizations) https://code.google.com/p/lz4/issues/detail?id=56 ...but,.. found this updated 2015 PDF nice (if it ever helps) http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf

chenall commented 9 years ago

@karyonix current lz4 has bugs, below command Will endless loop in lz4 code. map --mem /xxxx.lz4 (hdx)

karyonix commented 9 years ago

I can't compile branch 0.4.6a

asm.S:10330: Error: can't resolve .text' {.text section} -USB2DRI' {UND section}

chenall commented 9 years ago

Please compiled in Linux, and this version(http://reboot.pro/topic/20518-extract-dynamic-vhd-to-ram/) has same issue.

karyonix commented 9 years ago

@chenall In branch 0.4.6a dec_lz4.c, this fix.

Line Bug code Fixed code
248 while (lz4dec.nextBlockSize && lz4dec.dicPos + lz4dec.blockMaxSize <= LZ4_DICBUFSIZE) { {
271 lz4dec.dicFilePos -= dicPosSrc; lz4dec.dicFilePos += dicPosSrc;
355 } }
chenall commented 9 years ago

fixed! thanks! lz4 is faster than lzma.

Oxtie commented 9 years ago

@chenall , questions:

  1. Where to find latest Lz4 windows binaries?
  2. is it faster then *.gz?
  3. decoding speed is constant? (is there 'gear shifting' effect?)
  4. any option possible to show decoding speed MiB/s in boot process?
chenall commented 9 years ago
  1. Sources from https://github.com/Cyan4973/lz4 Windows bin: http://dl.grub4dos.chenall.net/LZ4.7z
  2. In my test it's faster than gz.
  3. I am not sure
  4. Not.
Oxtie commented 9 years ago

hi. I tried lz4.exe but i get error "not a valid 32bit application". I hope it's not x64 app, because I don't have x64 version of windows.

edit: I tried lz4.exe again from another computer and it compressed ok BUT, grub4dos show error: http://www.datafilehost.com/d/91963d64

chenall commented 9 years ago

please try this http://dl.grub4dos.chenall.net/lz4_xp.7z

usage: lz4 -9 --content-size xxx.img xxx.lz4

--content-size is needed

Oxtie commented 9 years ago

EDIT: (JULY UPDATES) https://github.com/Cyan4973/lz4/issues/108

Thanks, it works! (tested 4GiB and 7GiB)! It requires msvcr110.dll though (on xp). ( DLL: http://www.datafilehost.com/d/f4d782c1)

lz4.exe is using only 1 thread for compression (slow) but, never-mind. (EDIT Related: https://github.com/Cyan4973/lz4/issues/116)