AQUAOSOTech / tarsplitter

A multithreaded tar utility. Archive huge numbers of files, or spilt massive tar archives into smaller chunks.
MIT License
31 stars 2 forks source link

Add support for specifying part sizes and stdin as input #3

Open imiric opened 5 years ago

imiric commented 5 years ago

Hi, thanks for this great tool.

I took a stab at addressing #1 to allow piping directly to tarsplitter from tar itself when input is specified as '-', which avoids the need for a large intermediary tar file. For example: tar -cvf - . | tarsplitter -i - -s 1G -o /tmp/archive-. This will create tar files that are at most 1GB in size, though individual sizes will vary depending on the input files and how they're sorted.

This comes at a cost of an external dependency (https://github.com/c2h5oh/datasize), but I hope you'll agree that it's best to not reinvent the wheel for parsing human-readable sizes.

The -p option doesn't make sense when input is stdin since we can't know the total input size in advance to calculate the part size. Similarly, if -s is not provided when input is stdin no splitting will occur and in both cases only a single tar file will be created, which defeats the purpose of using tarsplitter, so the user should ensure to always specify -s with -i -. Maybe we should enforce this explicitly, but I didn't think it was necessary.

On a separate note, I didn't test this with -m archive, which I think should be removed from tarsplitter, leaving this functionality to tar itself now that stdin is supported. I would also consider deprecating -p since the user usually wants control over the part size and not the quantity of files produced.

Ideally we should have unit/functional tests for all this, but I'll leave that for another PR. :)

Cheers,

Ivan