diskfs / go-diskfs

MIT License
494 stars 112 forks source link

squashfs: fix received X bytes instead of expected minimal 40 on listing #201

Closed ncw closed 8 months ago

ncw commented 8 months ago

When unpacking large squashfs archives you occasionally get this error:

received 38 bytes instead of expected minimal 40

This is due to the inode changing type (ie it is actual an extended inode) when read from the directory but the size not being updated. Most of the time there is enough data for the larger inode but not always.


I couldn't think of a way to test this.

I have an archive which demonstrates the problem you can download here: tensorflow.sqfs if you need it.

It was made with this script

#! /usr/bin/bash
ORG=${ORG:-tensorflow}
IMG=${IMG:-tensorflow}
TAG=${TAG:-latest-gpu-jupyter}
docker export $(docker create $ORG/$IMG:$TAG) -o $IMG.tar.gz
mkdir -p $IMG && tar xf $IMG.tar.gz -C $IMG
[ -f $IMG.sqfs ] && rm $IMG.sqfs
mksquashfs $IMG $IMG.sqfs  -comp zstd -Xcompression-level 3 -b 1M -no-xattrs -all-root
deitch commented 8 months ago

The following is the current process:

  1. call getInode(), passing it the inode type
  2. get the size from the passed type, uncompress it, get the header
  3. If the actual type is not the expected type, use the actual type

I think this change of yours says:

  1. OK, but if the actual type is different, then not only should the type be different, but the size should be different, too, so reread the inode using the newly-discovered size from the actual header.

Is that correct?

I couldn't think of a way to test this. I have an archive It was made with this script

Something in the directory structure must have triggered it. What exactly was it? Could we isolate that?

ncw commented 8 months ago

Is that correct?

Yes that is correct, though in step 4 we may have read enough data already. I'll see if I can create a squashfs which causes the problem - watch this space!

ncw commented 8 months ago

I managed to make this squashfs which demonstrates the problem.

tensorflow-nolinks.sqfs.zip

(I had to zip it for upload).

It is the tensorflow docker image but with all the files truncated to 0 size and all the symlinks removed.

It is 816K so not too big!

I could write a test which does a directory traversal of it and check that we can see all the files.

What do you think? A good enough test?

I didn't manage to make a synthetic test!

deitch commented 8 months ago

It is 816K so not too big!

It isn't, although I suspect we could make it smaller. The real question is what is triggering it.

Wouldn't a simple image with just one file (and therefore 1 inode, or maybe 2 if you count the directory), but where we call getInode() with the basic type work?

I am trying to understand how this is triggered. getInode() is called only from 3 places:

When you run into this problem with your sample above? What actual path is it taking? I assume it is not the first, but the second or third? What is the actual case triggering it?

ncw commented 8 months ago

Wouldn't a simple image with just one file (and therefore 1 inode, or maybe 2 if you count the directory), but where we call getInode() with the basic type work?

I managed to make a test image eventually! I think it only triggers when the directory reading is

I managed to get this to trigger by using 4k page sizes and 300 files with xattrs (to force them to be extended types).

I've added this to the commit.

In the process I discovered another bug! There was a small typo in the xattr parsing which I've also fixed - the new test doesn't pass without the fix.

deitch commented 8 months ago

I managed to get this to trigger by using 4k page sizes and 300 files with xattrs (to force them to be extended types).

All about the right combination to trigger the problem. Thank you, indeed, for tracking it down.

You have done so much for this PR, I hate to ask for more. Can I ask to expand the comment in buildtestsqs.sh to explain the corner case we are dealing with? No other requests, then we can merge this in.

ncw commented 8 months ago

You have done so much for this PR, I hate to ask for more. Can I ask to expand the comment in buildtestsqs.sh to explain the corner case we are dealing with? No other requests, then we can merge this in.

I've done that now :-) Let me know if you need any other changes.

deitch commented 8 months ago

Nope, looks great. Thank you!