I took a stab at addressing #1 to allow piping directly to tarsplitter from tar itself when input is specified as '-', which avoids the need for a large intermediary tar file. For example: tar -cvf - . | tarsplitter -i - -s 1G -o /tmp/archive-. This will create tar files that are at most 1GB in size, though individual sizes will vary depending on the input files and how they're sorted.
This comes at a cost of an external dependency (https://github.com/c2h5oh/datasize), but I hope you'll agree that it's best to not reinvent the wheel for parsing human-readable sizes.
The -p option doesn't make sense when input is stdin since we can't know the total input size in advance to calculate the part size. Similarly, if -s is not provided when input is stdin no splitting will occur and in both cases only a single tar file will be created, which defeats the purpose of using tarsplitter, so the user should ensure to always specify -s with -i -. Maybe we should enforce this explicitly, but I didn't think it was necessary.
On a separate note, I didn't test this with -m archive, which I think should be removed from tarsplitter, leaving this functionality to tar itself now that stdin is supported. I would also consider deprecating -p since the user usually wants control over the part size and not the quantity of files produced.
Ideally we should have unit/functional tests for all this, but I'll leave that for another PR. :)
Hi, thanks for this great tool.
I took a stab at addressing #1 to allow piping directly to tarsplitter from tar itself when input is specified as '-', which avoids the need for a large intermediary tar file. For example:
tar -cvf - . | tarsplitter -i - -s 1G -o /tmp/archive-
. This will create tar files that are at most 1GB in size, though individual sizes will vary depending on the input files and how they're sorted.This comes at a cost of an external dependency (https://github.com/c2h5oh/datasize), but I hope you'll agree that it's best to not reinvent the wheel for parsing human-readable sizes.
The
-p
option doesn't make sense when input is stdin since we can't know the total input size in advance to calculate the part size. Similarly, if-s
is not provided when input is stdin no splitting will occur and in both cases only a single tar file will be created, which defeats the purpose of using tarsplitter, so the user should ensure to always specify-s
with-i -
. Maybe we should enforce this explicitly, but I didn't think it was necessary.On a separate note, I didn't test this with
-m archive
, which I think should be removed from tarsplitter, leaving this functionality to tar itself now that stdin is supported. I would also consider deprecating-p
since the user usually wants control over the part size and not the quantity of files produced.Ideally we should have unit/functional tests for all this, but I'll leave that for another PR. :)
Cheers,
Ivan