Closed GoogleCodeExporter closed 9 years ago
> I'm just a bit curious : is fseek() guaranteed to always produce zero-filled
space, no risk of "garbage-filled" space instead ?
Strictly, no. For major platforms, yes. Practically, even if we'll support
sparse-file by fseek(), we should remain current conservative code.
# C Standard
I think standard (C89,99,11) says nothing about it. Standard only guarantees
ftell()/fseek() pair (See N1570 / 7.21.9.2 The fseek function).
So maybe it's platform/filesystem dependent.
# Major platforms
But the following major platforms/libraries supports 'zero-filled' behaviour.
POSIX : http://pubs.opengroup.org/onlinepubs/009695399/functions/fseek.html
> The fseek() function shall allow the file-position indicator to be set beyond
the end of existing data in the file. If data is later written at this point,
subsequent reads of data in the gap shall return bytes with the value 0 until
data is actually written into the gap.
glibc :
http://www.gnu.org/software/libc/manual/html_node/File-Position-Primitive.html
> If you set the position past the current end, and actually write data, you
will extend the file with zeros up to that position.
I belive many embedded OS/platforms use gcc/glibc pair, or mimic glibc's
behaviour.
FreeBSD : http://www.freebsd.org/cgi/man.cgi?query=fseek
> The fgetpos(), fsetpos(), fseek(), ftell(), and rewind() functions con-form
to ISO/IEC 9899:1990 (``ISO C90'').
> The fseeko() and ftello() functions conform to IEEE Std 1003.1-2001
(``POSIX.1'').
FreeBSD's fseek() internally uses fseeko(). So it also conforms POSIX.
http://svnweb.freebsd.org/base/head/lib/libc/stdio/fseek.c?view=markup
MacOSX / iOS :
https://developer.apple.com/library/ios/documentation/System/Conceptual/ManPages
_iPhoneOS/man3/fseek.3.html
> The fseeko() and ftello() functions conform to IEEE Std 1003.1-2001
(``POSIX.1'').
BSD family.
Windows / MSVC : https://msdn.microsoft.com/en-us/library/75yw9bf3.aspx
> The pointer can also be positioned beyond the end of the file.
It says nothing about what will happen when we set it beyond EOF. But since
MSVC-crt just uses Win32 APIs, NTFS sparse file (or zero-filled file for old
filesystems) will be created.
(See %ProgramFiles(x86)%\Microsoft Visual Studio 12.0\VC\crt\src\fseek.c :
fseek() -> _lseek() -> SetFilePointerEx())
# Solution
Since C standard say nothing about fseek() beyond EOF, we should say it's
platform dependent. But major platforms support zero-filled behaviour and
sparse file.
So it would be nice to implement sparse file support as default but
configurable part.
Original comment by takayuki...@gmail.com
on 11 Feb 2015 at 2:01
Neil's benchmarks
https://groups.google.com/d/msg/lz4c/ExYLfHCY-UY/uXCG0R98IR8J
Based on Neil's benchmarks, we can estimate effect of native sparse file
support.
(A) 215 sec : lz4 native
(B) 314 sec : lz4 -cd | cp
(C) 202 sec : lz4 -cd | cp --sparse=always
(B)-(A) = 99 sec : native vs pipe cost
(B)-(C) = 112 sec : --sparse=always gain
(A)-((B)-(C)) = 103 sec : (Estimated) lz4 native with sparse-file support
My impressions :
- Neil was trying to compress 80GiB files in few minutes. Aside from using sparse file or not, I believe it's a good problem domain for lz4 cli.
- Pipe is too heavy :( But its 80GiB !
- Almost 50% gain is huge. It's worth trying.
Original comment by takayuki...@gmail.com
on 11 Feb 2015 at 2:07
@Yann, aside from this issue.
In MSVC crt, fseek(x, SEEK_CUR) seems quite problematic to use with large
(>2GiB) file.
So we should change fseek() to _fseeki64() in lz4io.c
----
Notes for Windows / MSVC
(1) We must use DeviceIoControl() to set FSCTL_SET_SPARSE attributes explicitly.
(2) We must use _fseeki64(). MSVC's fseek() has 2GiB barrier.
in fseek.c : fseek() -> _fseek_nolock(), offset is converted SEEK_CUR to
SEET_SET.
But during this calculation, offset is long (32bit signed int). It is 2GiB
barrier.
int _fseek_nolock (FILE *str, long offset, int whence) {
...
if (whence == SEEK_CUR) {
offset += _ftell_nolock(stream); // 2GiB barrier
whence = SEEK_SET;
}
...
}
Same function in fseek64.c : _fseeki64() -> _fseeki64_nolock(). In this case,
offset is __int64 (64bit signed int).
_fseeki64() is supported since Visual Studio 2005
https://msdn.microsoft.com/en-us/library/75yw9bf3%28v=vs.80%29.aspx
In Visual Studio 2005, _MSC_VER == 1400
http://stackoverflow.com/a/70630/2132223
Original comment by takayuki...@gmail.com
on 11 Feb 2015 at 3:46
Attachments:
This issue is greatly documented.
Many thanks Takayuki !
Original comment by yann.col...@gmail.com
on 11 Feb 2015 at 5:59
It's worth noting from the xz code that the fiddly bit is dealing with zeros at
the end of a file. You generally have to write at least the last byte to make
sure that the length is correct.
Additionally the trick only works for regular seek able files, so you have to
check that the output is the right sort first (but can include stdout which may
just be a regular file redirect rather than a pipe).
However I would like a switch that enables the behaviour on any seek able
device - since my use case targets block devices directly and I can easily
guarantee they are zero filled before hand.
Original comment by n...@aldur.co.uk
on 11 Feb 2015 at 7:31
Useful discussion here on testing a buffer for zero:
http://stackoverflow.com/questions/1493936/faster-approach-to-checking-for-an-al
l-zero-buffer-in-c
Fastest is to use the memcmp trick with 64 bit integers - assuming you can get
around the alignment and non-integer buffer size issues.
Original comment by n...@aldur.co.uk
on 11 Feb 2015 at 7:33
I've implemented experimental version :
https://github.com/t-mat/lz4/tree/gc-issue/155
# Build
To build this experimental version, define 'LZ4IO_ENABLE_SPARSE_FILE' like the
following command :
make lz4programs 'CFLAGS=-O3 -DLZ4IO_ENABLE_SPARSE_FILE=1'
./programs/lz4 -h
You will see "EXPERIMENTAL_SPARSE_FILE" as lz4 revision :
"*** LZ4 command line interface 64-bits EXPERIMENTAL_SPARSE_FILE, by Yann Collet (...) ***"
# Experiment
This experimental version adds option "-x" for sparse file for decompression.
You can use this option like this :
./programs/lz4 -9 -f my-file
./programs/lz4 -d -f -x my-file.lz4 my-file.lz4.out
cmp my-file my-file.lz4.out
Original comment by takayuki...@gmail.com
on 12 Feb 2015 at 6:49
@Neil
Since my implementation is just a proof-of-concept, I'm glad to see yours :)
Original comment by takayuki...@gmail.com
on 12 Feb 2015 at 7:36
If I may I will take yours and enhance it. I prefer to stand on the
shoulders of giants.
Original comment by n...@aldur.co.uk
on 12 Feb 2015 at 8:25
Great work Takayuki !
Original comment by yann.col...@gmail.com
on 12 Feb 2015 at 8:26
Benchmark results : https://gist.github.com/t-mat/6babbbc9edf726fbaf9c
Perhaps it's compiler dependent, but GNU coreutil's is_nul() is surprisingly
good !
Original comment by takayuki...@gmail.com
on 12 Feb 2015 at 12:07
Attachments:
is_nul needs a sentinel after the buffer to work properly. Not that
that is difficult to create.
Original comment by n...@aldur.co.uk
on 12 Feb 2015 at 1:03
@Neil
Thanks. I've fixed it :
https://github.com/t-mat/lz4/commit/0fc157798abca49f5bcdc2c34fdaab013038e474
Could you confirm my benchmark ? I'm still not sure about my result is my
envionment dependent or not.
Original comment by takayuki...@gmail.com
on 12 Feb 2015 at 1:47
@Yann,
First of all, thanks for the effort. I've read recent "sparseFile" branch.
Maybe the following issues are known. I wrote this as future check list.
(1) For Windows platform, additional DeviceIoControl() is needed.
https://github.com/t-mat/lz4/blob/gc-issue/155/programs/lz4io.c#L102
https://github.com/t-mat/lz4/blob/gc-issue/155/programs/lz4io.c#L112
https://github.com/t-mat/lz4/blob/gc-issue/155/programs/lz4io.c#L789
(2) Currently, we can't "concatenate" sparse area larger than block size.
https://github.com/Cyan4973/lz4/compare/ceec6fa849...e38c268b5a#diff-abb30b1425d
c9e3e5441d294b2f7e909R611
(3) It seems slower than GNU coreutil's is_nul(). Some benchmarks are needed.
https://github.com/Cyan4973/lz4/compare/ceec6fa849...e38c268b5a#diff-abb30b1425d
c9e3e5441d294b2f7e909R609
Original comment by takayuki...@gmail.com
on 11 Mar 2015 at 9:26
Great review, thanks Takayuki !
I plan to work on point 2, so it's likely going to be improved in a next
release.
Thanks for pointing DeviceIoControl() requirement on Windows, it totally evaded
my reading.
Point 3 looks a bit difficult to measure. I currently have no idea on how to
bench such impact reliably.
There is also a small difference in scope :
is_nul() expects the whole input buffer to be null,
but the little loop inside current work version can stop anytime within the
block, meaning we are still going to skip whatever amount of zero data has been
detected, even if the buffer is not entirely null...
Original comment by yann.col...@gmail.com
on 11 Mar 2015 at 10:25
The new version at https://github.com/Cyan4973/lz4/tree/sparseFile has improved
sparse support, featuring better performance, by eliminating restriction 2.
Using 64KB blocks, the new version generates files more than twice smaller than
the previous one.
The funny thing is that, the new version is now much better than cp
--sparse=always
Here is an example.
The current chain of test uses datagen to generate sparse data.
So when doing :
./datagen -g50M -P100 > sparse.bin
cp --sparse=always sparse.bin sparsecp
./lz4 -B4 sparse.bin | ./lz4 -dX > sparselz4
ls -ls sparse*
51200 -rw-r--r-- 1 yann yann 52428800 mars 13 14:53 sparse.bin
5636 -rw-r--r-- 1 yann yann 52428800 mars 13 14:53 sparsecp
1796 -rw-r--r-- 1 yann yann 52428800 mars 13 14:54 sparselz4
As can be seen, the number of allocated KB is 3x smaller using lz4 -dX as
compared with cp --sparse=always. I'm not sure why, but it's good news.
(yes, I checked with diff, and there is no problem with sparselz4 :
diff -s sparse.bin sparselz4
Files sparse.bin and sparselz4 are identical
)
The current algorithm limitation is that it only tests zero-filled segment at
the beginning of each block. Consequently, larger block produce larger files,
because later zero-filled segments are not tested :
ls -ls sparse*
51200 -rw-r--r-- 1 yann yann 52428800 mars 13 14:53 sparse.bin
1796 -rw-r--r-- 1 yann yann 52428800 mars 13 14:54 sparselz4B4
9640 -rw-r--r-- 1 yann yann 52428800 mars 13 15:05 sparselz4B5
27988 -rw-r--r-- 1 yann yann 52428800 mars 13 15:05 sparselz4B6
46104 -rw-r--r-- 1 yann yann 52428800 mars 13 15:05 sparselz4B7
The difference is fairly noticeable.
It pushes in favor of using B4 (or B4D) with sparse support enabled.
Note however that current default mode is B7.
Which means, there are 2 solutions :
- B4D will become default mode (which I plan to do in a later version, so
basically in conjunction with making sparse support enabled by default), and
the inefficiency at B7 setting is not considered important anymore, because
basically no one will continue using B7
- The algorithm needs to be refined to address this weakness. There is a risk
of increased CPU load in this process, so a fine balance must be found.
For your comments
Original comment by yann.col...@gmail.com
on 13 Mar 2015 at 2:13
I finally decided to tackle the issue with difference in performance depending
on block size, by simply defining sub-blocks.
The result is now pretty good (uploaded at sparseFile branch) :
ls -ls sparse*
51200 -rw-r--r-- 1 yann yann 52428800 mars 13 20:09 sparse.bin
552 -rw-r--r-- 1 yann yann 52428800 mars 13 20:28 sparselz4B4
552 -rw-r--r-- 1 yann yann 52428800 mars 13 20:29 sparselz4B5
552 -rw-r--r-- 1 yann yann 52428800 mars 13 20:29 sparselz4B6
552 -rw-r--r-- 1 yann yann 52428800 mars 13 20:29 sparselz4B7
Identical result with all block sizes, and improved performance.
Obviously, the performance is excellent here because the generated file is
*very* sparse.
Nonetheless, at 10x the performance of cp --sparse=always, I figure it's good
enough.
It's even able to get some small parse segments from larger archives by
accident, such as, for example, silesia.tar :
206984 -r--r--r-- 1 yann yann 211948032 juin 13 2013 silesia.tar
206932 -rw-r--r-- 1 yann yann 211948032 mars 13 20:38 silesia.tar.sparse
It's quite minor, but worth noting, especially as cp --sparse=always doesn't
get same success.
Anyway, it still leaves on the table point 1 and 3 :
1) Windows support
Should be straightforward, it's just necessary to follow Takayuki's
instructions.
2) Performance measurement
This one is much more tricky.
It's correct to say I've not been able to get something meaningful so far.
Everything is so fast.
At best, I can merely tell that I'm unable to witness any slowdown introduced
by enabling sparse support, even on non-sparse files. But that's not a great
statement.
Original comment by yann.col...@gmail.com
on 13 Mar 2015 at 7:49
minor detail, but I note that tar is using -S for sparse file support.
Problem being : I'm using -Sx for stream checksum disabling...
Original comment by yann.col...@gmail.com
on 13 Mar 2015 at 8:15
Windows sparse file support has just been added, following your instructions
Takayuki.
It works.
For some reason though, it's less efficient than Linux.
The 50MB sparse file I'm using for test goes down to 6.18MB,
which is respectably less, but still a lot more than the 550KB of Linux.
I was surprised because NTFS block size is 4 KB, so it should be able to use
that kind of granularity.
It seems not. My understanding is that Windows will only "not allocated" clean,
aligned blocks of 64 KB zeros, leaving a ton of smaller opportunities on the
table. Hence inflated results compared to Linux ext4.
But well, on the plus side, that's nonetheless an improvement.
Original comment by yann.col...@gmail.com
on 13 Mar 2015 at 9:20
# Performance measurement
Revamped my benchmark : https://gist.github.com/t-mat/47890e581bc624a4fa3f
New dev branch (8a87769af8) is fastest :)
# cp --sparseAlways
I've investigated it. And I learned :
GNU cp's is_nul() returns boolean value.
Since it just checks "Buffer is entirely zero or not", spase area is always
quantized to buffer size.
Call tree is :
coreutils/src/cp.c : main() -> do_copy()
-> copy.c : copy() -> copy_internal() -> extent_copy() or copy_reg() -> sparse_copy()
-> system.h : is_nul()
# Option symbols
Perhaps, it is good time to introduce long options like "--sparse" for advanced
or minor switches.
Currently lz4cli has 18 options and it is still increasing, we'll soon see
uniqueness problem.
So define only long options for now, and wait for increasing of demand to short
option.
Original comment by takayuki...@gmail.com
on 15 Mar 2015 at 3:44
Thanks for testings Takayuki.
It gives some good reference to compare lz4 implementation against.
Regarding option symbols :
I feel you are right.
Maybe it's time to introduce long symbols.
I'll open an entry for this objective.
Original comment by yann.col...@gmail.com
on 15 Mar 2015 at 7:11
Integrated into r128
Original comment by yann.col...@gmail.com
on 31 Mar 2015 at 1:37
Original issue reported on code.google.com by
yann.col...@gmail.com
on 10 Feb 2015 at 2:33