Closed GoogleCodeExporter closed 9 years ago
Original comment by yann.col...@gmail.com
on 22 Jan 2015 at 4:05
Hello Kyle
Indeed, the first behavior you are observing is considered "correct", which
means as intended by the developer.
"inFile1 outFile2" behavior is so entrenched within lz4 command line tool that
it will difficult to modify that choice now without creating some massive
problems for installed user base.
Should a "multiple input file" process exist, it will need its own command, to
ensure compatibility with existing scripts.
And since one command must be added, it will not work "as is".
On top of that, multiple input files is a deliberately avoided feature so far,
because it introduces a huge corpus of File Systems related issues to handle.
And this a workload not be underestimated.
As a consequence, the preferred way to handle "multiple input files" scenario
is to couple lz4 with tar, using pipe. The advantage is that tar is a well
known and well supported aggregation tool, compatible with a large base of file
systems.
That's how I handle such use case myself, and how I recommend it to everyone.
I believe it would answer your use case too, even it if makes the line of
script slightly longer. For some useful examples (should you ever need it) :
http://www.computerhope.com/unix/utar.htm
Original comment by yann.col...@gmail.com
on 22 Jan 2015 at 4:30
> Expect: cli program (lz4) to remove input_filename sources like gzip, bzip2,
and xz, with an option to skip this (-k, for keep).
The answer to this second behavior is a bit different.
We had some discussions about it quite some time ago, when settling lz4 command
line behavior.
The conclusion was that the "delete by default" behavior was probably a good
design choice back in the days when storage space was scarce and expensive. But
it is no longer appropriate today.
This is scary to imagine an administrator could lose some file just because he
forgot to add "-k" to his script or his command line. And the too-typical
answer "RTFM" translated "it's his fault, he should have paid attention" is no
longer acceptable either.
So it was a deliberate choice to keep original file as default.
I understand it makes lz4 behavior slightly different from gzip. But I really
believe it is the right thing to do today.
Deleting a file is a separate command line operation. If there was some
standard command switch to delete a file after compression, lz4 would have
supported it, but there is none, as far as I know. So it's not necessary to
make the list of commands longer.
Original comment by yann.col...@gmail.com
on 22 Jan 2015 at 4:45
Hello. Thanks for the quick updates.
On comment #2: I agree it would probably be a new switch or a different cli
program altogether to allow the existing user base to maintain functionality.
The use of tar is reasonable for cases when all files can to go into a single
archive, but it doesn't permit the creation of multiple .lz4 files at once.
Granted, the latter is a less-often use-case, but it does exist (which prompted
me to file an issue heh).
Also, you mentioned there would be a large number of File Systems related
issues with using multiple inputs. Forgive my ignorance, but could we not copy
the logic from, for example, existing GNU programs which already handle the
typical: some_prog -a -b -c file1 file2 ... ? My apologies if I'm off base,
but it seems like we could get the file list in a similar manner and then loop
over the existing logic near the end of main() to run the
DEFAULT_COMPRESS/DECOMPRESS actions.
On comment #3: Yea, having a delete-by-default action is purely subjective. I
see the merits and consequences. Having the lz4 cli do it by default is
perhaps overkill; though a switch to auto-delete would be useful. That said,
the lzop program doesn't do this and the world keeps spinning :) This was a
very minor consideration in my mind when filing the issue; it just seemed
appropriate to bring up at the same time.
Original comment by KyleJHar...@gmail.com
on 22 Jan 2015 at 5:05
> it doesn't permit the creation of multiple .lz4 files at once. Granted, the
latter is a less-often use-case, but it does exist
Good point. I did not thought about it when answering this issue.
Maybe it does deserve a switch after all...
> Also, you mentioned there would be a large number of File Systems related
issues with using multiple inputs. Forgive my ignorance, but could we not copy
the logic from, for example, existing GNU programs which already handle the
typical: some_prog -a -b -c file1 file2 ... ?
This was related to my previous answer, when I wrongly understood your were
willing to compress several files into a single archive.
In such case, preserving file attributes, storing filenames (cyrillic ? chinese
? arab ? etc.), directory structure, link, etc. is a serious job.
But if the point is just to loop over multiple filenames to compress them one
by one, then it's more trivial to implement.
Original comment by yann.col...@gmail.com
on 22 Jan 2015 at 5:16
Ahh, sorry for the confusion. Yes my intention is to just loop over multiple
files and compress them one by one. Such that:
lz4 file1 file2 file3 become...
file1.lz4
file2.lz4
file3.lz4
The main benefits are as listed above: parallelization (find/xargs) and
avoiding the expense of spawning (and then deconstructing) a new process for
each file.
I actually spent a couple hours last night implementing such a loop. It worked
to an extent, but my knowledge of C is weak. I kept reaching segfaults and I
don't have knowledge on debugging C very well. I'm pretty sure it was because
I was misusing the *input_filename, *output_filename, and *dynNameSpace
pointers; treating them like a char[] when trying to clear and assign the next
filename to process. Someone with experience writing C would have a much
easier time than I did heh.
Original comment by KyleJHar...@gmail.com
on 22 Jan 2015 at 5:32
OK
Let's suppose a new switch is created for this use case, for example -M,
would that be fine for your use case ?
Original comment by yann.col...@gmail.com
on 22 Jan 2015 at 5:44
Absolutely. That would give a big boost to compatibility with other
compressors (as well as GNU-style cli programs).
I am happy to help any way I can if you need anything. Thanks again!
Original comment by KyleJHar...@gmail.com
on 22 Jan 2015 at 5:47
OK, I'll handle it
Original comment by yann.col...@gmail.com
on 22 Jan 2015 at 6:59
Hey, I went back and looked at the code. Not sure why it gave me such an issue
the first time.
Anyway, I checked out a copy from git (github: Cyan4973/lz4/master) and made
the patch. I sent a pull request for it. As mentioned I'm not an expert in C,
but I believe the patch should be acceptable with little to no modification.
If it looks good, please merge and feel free to close this issue.
Thanks.
Original comment by KyleJHar...@gmail.com
on 15 Feb 2015 at 5:07
The dev branch at :
https://github.com/Cyan4973/lz4/tree/dev
has received significant udate with regards to issue 151.
The code do no longer depends on "VLA" (Variable Length Array),
as nicely described by Takayuki at
https://github.com/Cyan4973/lz4/pull/52#issuecomment-76760185,
which makes it more portable.
Of importance too, the logic has been changed to be ported by LZ4IO instead.
It now relies on a new function :
int LZ4IO_compressMultipleFilenames(const char** inFileNamesTable, int
ifntSize, const char* suffix, int compressionlevel);
which seems a better responsibility distribution.
The scheme has even been extended to benchmark, so that it can bench multiple
files in a row (which was already possible, but would crash is a command was
added in the middle or end of the filename list).
A remaining significant difference with gzip is that the process stops when it
reaches an unavailable filename.
For example, trying :
./lz4 -m file1 nothere file2
it would output :
compress XX bytes into YY bytes ==> PP%
impossible to open file nothere
while gzip would output a warning message, and continue on compressing file2.
It's a matter of error vs warning status.
If you have a preference, please tell.
Original comment by yann.col...@gmail.com
on 7 Mar 2015 at 12:31
Integrated into r128
Original comment by yann.col...@gmail.com
on 31 Mar 2015 at 1:37
Original issue reported on code.google.com by
KyleJHar...@gmail.com
on 22 Jan 2015 at 3:56