AgentD / squashfs-tools-ng

A new set of tools and libraries for working with SquashFS images
Other
194 stars 30 forks source link

squashfs image created by gensquashfs cannot be opened by 7zip #85

Closed gocmil closed 2 years ago

gocmil commented 3 years ago

squashfs image file is created using command:

gensquashfs -D /dir-with-files/ -d mtime=date "+%s" -f -T -e -x --all-root -c gzip gensquashfs.img

It can be analyzed using unsquash, but cannot be opened with 7zip with error message: "Cannot open file gensquashfs.img as archive"

7zip version: 18.05

$ gensquashfs --version gensquashfs (squashfs-tools-ng) 1.1.0 Copyright (c) 2019 David Oberhollenzer et al License GPLv3+: GNU GPL version 3 or later https://gnu.org/licenses/gpl.html. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.

AgentD commented 3 years ago

Hi,

thanks for reporting this. Can you provide a little more detail? Is it possible to mount the image? Can the image be read by sqfs2tar, rdsquashfs or unsquashfs? Is it somehow possible to get a more useful error message out of 7zip?

gocmil commented 3 years ago

It is possible to mount the image. It can be read by unsquashfs, I didn't try other two. I didn't get anything more from 7zip, I tried it from console with verbose output but it didn't give any more info:

$ 7z l gensquashfs.img -bb3

7-Zip [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21 p7zip Version 16.02 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,64 bits,32 CPUs AMD Ryzen Threadripper 1950X 16-Core Processor (800F11),ASM,AES-NI)

Scanning the drive for archives: 1 file, 3280896 bytes (3204 KiB)

Listing archive: gensquashfs.img

ERROR: gensquashfs.img : Can not open the file as archive

Errors: 1

If the squashfs image is generated using mksquashfs with the same options as used with gensquashfs, the size of the files differ and 7zip can open the one created by mksquashfs.

Example:

$ mksquashfs ${BUILDDIR}/${TEST_IMAGE_DIR} ${BUILDDIR}/${TEST_IMAGE_NAME} -noappend -all-root -no-xattrs -always-use-fragments Size: 3279319 bytes

$ gensquashfs -D ${BUILDDIR}/${TEST_IMAGE_DIR} -d mtime=date "+%s"-f -T -e -x -c gzip --all-root ${BUILDDIR}/${TEST_IMAGE_NAME} Size: 3278335 bytes

gocmil commented 3 years ago

Hi, any update for this?

AgentD commented 3 years ago

Hi,

sorry for the long delay. I got a bit swamped with some other stuff on my end. I do have a reproducible case myself now, for an archive, generated with squashfs-tools that 7zip happily accepts, but no longer recognizes as SquashFS after passing it through an ./sqfs2tar in.sqfs | ./tar2sqfs out.sqfs pipeline. Well actually, 7zip doesn't seem to recognize any images that tar2sqfs and gensquashfs generate.

I went through both images with a hex editor and tried coming up with ideas how the programs behave differently. My initial idea was to add an export table (mksquashfs always generates one, unless told not to, while the squashfs-tools-ng programs behave the opposite way). This didn't help either.

Inode allocation order is also different. I haven't looked at it in detail yet, but both programs emit the root inode somewhere towards the end of the inode table.

I have a copy of the 7zip source, but haven't managed to build it myself yet. I found the squashfs parser (CPP/7zip/Archive/SquashfsHandler.cpp), a single file with ~2.3k lines of C++-ish source that seems to be quite a tangle.

All in all, I would say that this is a problem in 7zip, but I cannot point at anything concrete yet.

gocmil commented 2 years ago

Hi,

do you plan to file 7Zip issue if you are positive the problem is there?

AgentD commented 2 years ago

There is a now ticket in the 7-zip bug tracker on sourceforge, also linking back here:

https://sourceforge.net/p/sevenzip/bugs/2302/

AgentD commented 2 years ago

Update:

It appears, the problem is related to mksquashfs storing directory sizes off-by-3: https://github.com/plougher/squashfs-tools/blob/master/squashfs-tools/mksquashfs.c#L1287

Which so far hasn't caused any problems as the kernel side directory iterator appears to trust the value stored in the actual directory header. This will be fixed on the squashfs-tools-ng side, and I will make an effort in patching existing documentation, after looking a bit more into this (particularly also into the differing link count).

Dr-Emann commented 2 years ago

The "file size" is supposed to be the uncompressed size of all directory headers and their entries: How does this work in both cases (exact and +3) in all these readers?

The "file_size" in the directory inode is the only way to know if there is another header following, no? I expected something which boils down to something like this pseudo-code?

# dir_file_size -= 3
while dir_file_size > 0:
  dir_file_size -= sizeof(dir_header)
  if (dir_file_size < 0) error
  read dir_header
  for _ in range(dir_header.count):
    dir_file_size -= sizeof(dir_entry)
    if (dir_file_size < 0) error
    read dir_entry
    dir_file_size -= dir_entry.name_size
    if (dir_file_size < 0) error
    read name_size bytes

How are these readers not either expecting another header (which won't fit) for mksquashfs files if they don't subtract 3, or complaining that the name of the last entry extends beyond the bounds of the directory info for gensquashfs files if they do subtract 3?

AgentD commented 2 years ago

@Dr-Emann The squashfs kernel code does basically what your pseudo code does, but it doesn't check the dir_file_size in the inner loop and just reads ahead if the local header says it should. So when it reaches the last entry, it simply reads 3 bytes more than it theoretically should (if it respected the file size).

On the other hand, the squashfs-tools-ng libsquashfs checks if the size is enough to read another header. Since both the dir header and the entry header are > 3, it simply aborts early with 3 spare bytes left when given an image where the size is 3 bytes too big. It would only fail if the stored size was too small.

On the third hand, (as I understood it so far) 7-zip expects the size to be off-by-3, subtracts the 3, and then bails because reading the last entry (using the size indicated by the header) would read beyond the size indicated by the inode.

Dr-Emann commented 2 years ago

I'm assuming I'm mis-reading the kernel code. It seems like if the size were +3, it would read the last entry, go to the top of the while loop, and length would be 3 less than the expected size, so it would read another full header and any entries specified by that "header" which extends past the end of the logical buffer.

Dr-Emann commented 2 years ago

Ah, I believe I understand, length will basically start as 3 because of the above ctx->pos manipulation.

So it sounds like the format is total size of the headers + the entries + 1 byte for "." + 2 bytes for "..", and 7-zip was correctly complaining that the archive was malformed, the kernel is arguably being too lenient.

AgentD commented 2 years ago

Commit 248494992442fbde7eb6ca3979a778d82fa86016 should fix this for the master and fixes-1.1.0 branches.

The cherry picked version on fixes-1.0.0 is c2aac9df8b5b29780bfc4a32fd38facf6a9b84a6.

The raw patches are here for 1.1.2 and here for 1.0.5.

gocmil commented 2 years ago

Many thanks!