AgentD / squashfs-tools-ng

A new set of tools and libraries for working with SquashFS images
Other
196 stars 30 forks source link

Corrupt OOL xattrs in custom writer #60

Closed showermat closed 4 years ago

showermat commented 4 years ago

First of all, thanks for the great library! I've been frustrated for a long time that there was no userspace library for interacting with SquashFS files, so finding this project really made my day.

I'm trying to write an archive using the libsquashfs API, but there's something wrong with the output file. When I retrieve xattrs stored out-of-line, the values I get back are corrupted. Each one is eight bytes, consisting the length of the value I originally wrote as a 32-bit value, followed by its first four bytes. In other words, it looks like the reference to the OOL value is pointing 4 bytes too early and has a fixed lenth of 8 bytes. Xattr values stored in-line are fine.

I'm basing my work on the implementation of gensquashfs, and as hard as I look I can't find the difference that accounts for this. I'm at the end of my rope, so I was just wondering if anything about this scenario stands out to you before I give up and put this aside for a few months.

I have minimal examples of an affected archive and the code that created it if you feel inclined to take a look. Thanks for any advice you can give!

AgentD commented 4 years ago

Hi,

I'm trying to write an archive using the libsquashfs API, but there's something wrong with the output file. When I retrieve xattrs stored out-of-line, the values I get back are corrupted. Each one is eight bytes, consisting the length of the value I originally wrote as a 32-bit value, followed by its first four bytes. In other words, it looks like the reference to the OOL value is pointing 4 bytes too early and has a fixed lenth of 8 bytes. Xattr values stored in-line are fine.

The OOL xattr records store an 8 byte reference to another value record instead of the actual value.

I just double checked with the kernel code (fs/squashfs/xattr.c). If an OOL xattr is encountered, it interprets the record value as a location and then reads another value record (header and everything) from that other location. My reader implementation looks very similar.

I also double checked the writer implementation which stores the location where it writes the header in a lookup table that it uses to store OOL locations.

Do you get the reference back instead of the value, or the value and part of the header? The former could mean that the OOL flag is missing.

Does this problem occur when you try to read the xattrs back with libsquashfs, or when you mount the filesystem?

I'm basing my work on the implementation of gensquashfs, and as hard as I look I can't find the difference that accounts for this. I'm at the end of my rope, so I was just wondering if anything about this scenario stands out to you before I give up and put this aside for a few months.

The interface itself should be pretty straight forward to use: start key/value pairs, add pair, end pairs ... and finally flush it all to disk. The management of the on-disk format, including the decision whether to store something directly or OOL deduplicated is all hidden away in the internals.

If there's actually something wrong with the offsets, it is most likely a library internal issue.

I have minimal examples of an affected archive and the code that created it if you feel inclined to take a look. Thanks for any advice you can give!

Yes please, I would like to take a look at that.

showermat commented 4 years ago

Ah! So I thought I wasn't able to reproduce this with archives written with gensquashfs, but I realized I was missing a step and it turns out I actually am. On the other hand, when I mount such an archive natively, the xattrs are fine, so I'm pretty sure it's an issue with your reader code. Here's what I did to reproduce:

# Generate some files
$ for i in {1..10}; do echo $i $i $i > $i; done
# Create an xattr that will be stored out-of-line
$ for i in {1..10}; do setfattr -n user.test -v "MY HOVERCRAFT IS FULL OF EELS" $i; done
# Add another one that's unique to prevent deduplication -- I forgot to do this earlier
$ for i in {1..10}; do setfattr -n user.num -v $i $i; done
$ gensquashfs test.sfs -D . -x
# ...
$ rdsquashfs -x 1 test.sfs
user.test=MY HOVERCRAFT IS FULL OF EELS  # Okay, because the first occurrence is stored in-line
user.num=1
$ rdsquashfs -x 2 test.sfs
user.test=0x1D0000004D592048  # Oops!
user.num=2

You can see the corrupted xattr in the second case is 8 bytes: 29 (the length of the value), 0, 0, 0, "MY H".

AgentD commented 4 years ago

I managed to reproduce the problem using the steps you provided.

It looks like the returned value contains the header of the target value and is a bit short in size, but it works when mounting the image.

It looks like the reader code missed to re-read the actual header after seeking to it. The following patch fixes it on my end:

0001-Fix-xattr-reader-read-the-header-after-seaking-to-an.patch.gz

I added the patch to the fixes-1.0.0 branch and uploaded the gpg signed patch here: https://infraroot.at/pub/squashfs/patches-1.0.0/

Thanks!

showermat commented 4 years ago

Yup, looks like that fixed it. Thanks for your quick response!