fingltd / 4mc

4mc - splittable lz4 and zstd in hadoop/spark/flink
Other
108 stars 37 forks source link

how to use 4mc tool for linux fs batch processing and logging #43

Open gabrieljames opened 5 years ago

gabrieljames commented 5 years ago

Having difficulty with using the command line tool (linux) to process a directory of uncompressed text files to produce a directory of compressed files with the stdout redirected to a log file.

4mc [input] seems fine with wildcards, however, i cannot get wildcards working with [output] names, and cannot get the stdout to log to a file.

4mc -vz2 ./*.txt ./* >> log.txt

It is good to have a command line tool to test, but seems very limited for doing batch workloads on the local filesystem before uploading hdfs. Was expecting bash or gzip style basic input output log operations would work using a similar command syntax. If these operations are supported, could some documentation be added to describe the syntax, ideally in the -h help

carlomedas commented 5 years ago

The work mode and syntax should be similar to most of standard compression tools (like e.g. tar and gzip) that can take multiple inputs but then it compress to a single file?

gabrieljames commented 5 years ago

The main issues were with using a paths and wildcards for input and output operations, and with logging verbose operations to a log file.

`4mz -vz2 /inputdir/inputfiles.txt /outputdir/ >> logfilename.txt'

Was unable to get custom output directories and unable to get logging working.

Found it a bit light on error checking and created unwanted blank files under some conditions. Overall, didn't feel like a robust tool, got fustrated after several hours of trying to get stdout working.

Can post some examples if helpful.