AgentD / squashfs-tools-ng

A new set of tools and libraries for working with SquashFS images
Other
194 stars 30 forks source link

Read from pipe (tar2sqfs, sqfs2tar) #24

Closed ulziibuyan closed 4 years ago

ulziibuyan commented 4 years ago

Do you have plans to support reading from stdin in those tools? I'm working on a feature in LXD to be able to import/export containers in SquashFS format. It would be nice if tar2sqfs and sqfs2tar tools could do that (actually it would be very efficient).

If you plan to support, I might be able to work on a pull-request. If you provide some guidelines that would be awesome too. Thanks!

AgentD commented 4 years ago

Regarding tarballs: tar2sqfs reads tarballs from stdin, sqfs2tar writes to stdout. In both cases this may be a pipe.

If you mean the SquashFS image itself, that isn't as easy: If you read the filesystem from stdin, you will sift through the data area first, but you don't know the structure of the data, i.e. which blocks are parts of files, what they are called an so on. Your only chance would be to splice the data out to some temporary location until you get to the inode table, which basically amounts to making a temporary copy of almost the entire image.

If you try to generate a SquashFS image on stdout, you have the problem that you can't seek back later to update the super block, once you know the exact layout, which you can't know in advance if the data comes from a tarball on stdin. This even becomes a problem if you do know the filesystem tree but do deduplication afterwards.

ulziibuyan commented 4 years ago

Thanks. Now that you've described, implementing such feature directly into the tools sounds somewhat unnecessary. The reason I'm trying to make your tools work this way is because I'm looking at them as a compression tool in line with gzip, bzip2, xz etc.. They support read form and output to pipes at the same time.

What do you think if I implement this by writing a script that wraps the two tools such that it can be supplied to tar via its -I switch? For compression mode, it would give a temporary file to tar2sqfs then replay it to stdout. For decompression mode, it would read in the entire pipe onto a temporary file and give it to sqfs2tar.

Would you accept the script? If not, I'll have to implement it on LXD side.

AgentD commented 4 years ago

gzip, bzip2 and xz are pure compression programs. They compress opaque data blobs that they know nothing about. They just feed everything that comes in through the compressor and add a simple header (or do the reverse when unpacking). For such a program it is trivial to read from and write to pipe.

Making tar use xz or gzip means that tar simply feeds its own output through those programs. They don't know it's a tar ball and they don't know what's inside. They simply see a blob of data and try to stream-compress it.

First of all: The whole point for a program to stream input and output at the same time is that the data never has to be on disk as a whole and can be potentially indefinite in size. IMO what your script would do is replacing 2 lines of shell script with one, while giving the dangerous assumption of being able to do stream processing to any user of the script.

Second, making tar somehow transparently feed its output through tar2sqfs or feeding its input through sqfs2tar to read SquashFS images IMO makes little sense, since SquashFS isn't just a compressor for tar but an entirely different format. Both formats have their own (very different) notion of arranging data and meta data.

OT: FreeBSD tar does support tons of other archive formats, but that's because it internally uses libarchive which actually understands all of those formats instead of using expensive format-converting filters. It might be of interest to add a libsquashfs based plugin to libarchive at some point.

ulziibuyan commented 4 years ago

Thanks for the elaborate explanation.