BurntSushi / rust-snappy

Snappy compression implemented in Rust (including the Snappy frame format).
BSD 3-Clause "New" or "Revised" License
440 stars 43 forks source link

szip all files in a directory recursively? #8

Closed emk closed 7 years ago

emk commented 7 years ago

I'm really loving szip for data-munging tasks!

I keep hitting a use-case where I need to *.szip all the files in a directory. For example, I might have a directory of hundreds of giant *.csv files output by xsv split, and I need them compressed before uploading them to S3.

I'd love to be able to run:

szip -r /path/to/dir/

...and to a high-performance directory walk and parallel szip.

But I admit this might be too specialized a use-case for szip, so I'm happy to go ahead and build a separate szipdir tool to handle this use case if that makes more sense. What do you think?

BurntSushi commented 7 years ago

gzip supports this flag, so I'd be in favor of adding it.

BurntSushi commented 7 years ago

I will note thought, that I wonder why you want this. find ./ -type f -print0 | xargs -0 -P8 szip should work just fine and do it in parallel.

emk commented 7 years ago

Oh, thank you! I'd forgotten about the -P flag to xargs. I always reach for one of the versions of parallel, and I always conclude, "No, not worth sorting out all the issues."

I'll try this on our next cluster job and see how it goes. If it works well, we should just close this issue.

emk commented 7 years ago

OK, I've tried xargs -P and it seems to work much more gracefully than other common parallelization tools. I'm OK with closing this issue, and thank you for your help! :-)