AgentD / squashfs-tools-ng

A new set of tools and libraries for working with SquashFS images
Other
194 stars 30 forks source link

Couple of issues round-tripping sqfs -> tar -> sqfs #74

Closed bifferos closed 2 years ago

bifferos commented 3 years ago

First off, I expected no differences with these commands:

sqfs2tar -r . -s linuxfs > out.tar
tar2sqfs -s -f newfs < out.tar

However when diffing I saw this printed:

$ sqfsdiff -a linuxfs -b newfs
/usr/bin has a basic type
/usr/bin/x86_64-linux-gnu-as has an extended type
/usr/lib/firmware/amdgpu has a basic type
/usr/lib/firmware/amdgpu/navi10_ce.bin has an extended type

It's not very clear to me what I'm being told here. Is this file 'a' which has basic type? Is it file 'b'? Or am I just being informed this file has basic type in both cases. Also, which options should I use to avoid this difference, if possible?

Thanks!

AgentD commented 3 years ago

The output that sqfsdiff produces here is from the perspective of the 'b' filesystem. I agree that the output is a little bit non-obvious. Besides the grandiose plans for binary patching, sqfsdiff was originally a debug tool that I pieced together early on for testing and has been a bit neglected since.

Basic type means that the SquashFS image uses basic inodes which cannot store extended attributes and have some further limitations depending on the type (basic files must be within 4G of the image, at most 4G in size and cannot account for sparse blocks; basic directories have a similar size restriction and don't have an index).

In this case, /usr/bin has a basic type in newfs and an extended type in linuxfs. It probably had a directory index in the original, but no extended attributes and got "demoted". Same for /usr/lib/firmware/amdgpu. For x86_64-linux-gnu-as and /usr/lib/firmware/amdgpu/navi10_ce.bin it is the other way around. If the results are correct, they can't possibly have extended attributes in the original, but they probably have sparse blocks that the original didn't account for and got "promoted".

I assume linuxfs was generated using mksquashfs? Unlike mksquashfs, the tar2sqfs and libsquashfs try to use basic inodes whenever possible whereas mksquashfs has a tendency to sweep sparse block accounting under the rug. I'm not sure what type of metric it uses to decide whether it should generate a directory index or not.

Hope that helps!

bifferos commented 3 years ago

Thanks for that detailed explanation. I am trying to re-pack a squashfs with as few changes as possible, and it seemed like the tar route was the best way to achieve that.

I was playing around with some code to turn the output of unsquashfs into pseudo file definitions for consumption by mksquashfs, that code is here: https://github.com/bifferos/squash_pseudo/blob/main/mkpseudo.py (I searched high and low for something that did that, but failed find anything so I wrote my own) Unfortunately mksquashfs doesn't seem to allow me to set the date on the pseudo files, and using this mechanism requires that I execute a 'cat' command to fill the file contents dynamically so it suffers from really slow execution time for large file systems. sqfs2tar seemed to overcome some of those limitations, at least it dealt with file timestamps properly but there are the problems you described above with round-tripping. You lose some metadata.

As an aside, another project I did was to generate a spec file for initrd, that would decompose an initrd into a series of definitions to be consumed by gen_init_cpio.c. https://github.com/bifferos/initrd/blob/master/initrd.py (again, I was really surprised I couldn't find that code already out there to do that).

I guess I want the squashfs equivalent of that such that it creates byte-for-byte output on a simple round-trip extract->recreate but it doesn't exist yet. I can't decide whether it's best to start with the original mksquashfs, start with your code, or look at the Python module: https://github.com/matteomattei/PySquashfsImage

Or perhaps just go my own way and roll my own.

AgentD commented 2 years ago

Not sure anymore why this is still open. IIRC I thought the python scripts and links might be interesting for others.

Squashfs patching is on the TODO list for a while now and there are tickets for it, so I'm closing this one for now.