Closed emk closed 7 years ago
gzip
supports this flag, so I'd be in favor of adding it.
I will note thought, that I wonder why you want this. find ./ -type f -print0 | xargs -0 -P8 szip
should work just fine and do it in parallel.
Oh, thank you! I'd forgotten about the -P
flag to xargs
. I always reach for one of the versions of parallel
, and I always conclude, "No, not worth sorting out all the issues."
I'll try this on our next cluster job and see how it goes. If it works well, we should just close this issue.
OK, I've tried xargs -P
and it seems to work much more gracefully than other common parallelization tools. I'm OK with closing this issue, and thank you for your help! :-)
I'm really loving
szip
for data-munging tasks!I keep hitting a use-case where I need to
*.szip
all the files in a directory. For example, I might have a directory of hundreds of giant*.csv
files output byxsv split
, and I need them compressed before uploading them to S3.I'd love to be able to run:
...and to a high-performance directory walk and parallel szip.
But I admit this might be too specialized a use-case for
szip
, so I'm happy to go ahead and build a separateszipdir
tool to handle this use case if that makes more sense. What do you think?