diskfs / go-diskfs

MIT License
494 stars 112 forks source link

squashfs: fix directory listings #195

Closed ncw closed 8 months ago

ncw commented 8 months ago

This fixes directory listings sometimes giving the error

could not parse directory header: header was 3 bytes, less than minimum 12

This appears to be because empty directories can indeed have no entries despite what the spec says.

This introduces a test checking that we can recurse the test squashfs file correctly - this fails without this patch.

ncw commented 8 months ago

Note that the test introduced here won't pass until https://github.com/diskfs/go-diskfs/pull/193 is merged. Maybe I should have put both commits in the same PR.

deitch commented 8 months ago

Maybe I should have put both commits in the same PR.

Don't worry about it. We can merge them in order, and when that one is in, rebase this.

ncw commented 8 months ago

Note that I'm working on a larger patch to fix up the reading of files out of squashfs which is quite broken in places. Its looking a lot better but it isn't there quite yet!

deitch commented 8 months ago

Do you need to rebase this on #193 now that it is merged in?

deitch commented 8 months ago

Note that I'm working on a larger patch to fix up the reading of files out of squashfs which is quite broken in places. It's looking a lot better but it isn't there quite yet!

Thank you, I am looking forward to it. Also to how the tests change and see what they missed (or just weren't implemented 😆 )

ncw commented 8 months ago

Do you need to rebase this on #193 now that it is merged in?

I've rebased this and I've also fixed the test I broke!

Note that I'm working on a larger patch to fix up the reading of files out of squashfs which is quite broken in places. It's looking a lot better but it isn't there quite yet!

Thank you, I am looking forward to it. Also to how the tests change and see what they missed (or just weren't implemented 😆 )

I think the tests are very well implemented :-) I've been working on more end-to-end tests so unpacking actual archives created by mksquashfs and there are a lot of corner cases!

I'm hoping to get this all passing shortly but if I get stuck I may push it anyway so you can see the progress and maybe give hints!

deitch commented 8 months ago

I am confused. Don't we need to parse the header? You are saying it is possible to have an empty directory without any header at all (even though the spec says otherwise). Then ho would we know if the directory is valid?

ncw commented 8 months ago

I am confused. Don't we need to parse the header?

I think that the lines of code I deleted 57-60 are effectively repeated at 63-66 when pos == 0 so this bit of code is just a check to see whether there is a header or not.

https://github.com/diskfs/go-diskfs/blob/e5b494094885d06b3a1ef8fd9334ed62480aa376/filesystem/squashfs/directory.go#L56-L66

You are saying it is possible to have an empty directory without any header at all (even though the spec says otherwise). Then ho would we know if the directory is valid?

If you look in the testdata/file.sqs you'll find such an empty directory at /a/b/c/d/ . If you undo my fix by putting those lines back in then the new test fails with:

squashfs_test.go:340: error reading directory /a/b/c/d: could not read directory at path /a/b/c/d: could not get entries: could not get entries: could not get entries: could not get entries: unable to read directory from table: could not parse directory header: header was 3 bytes, less than minimum 12

I don't know for sure whether that fix is correct, there could be a problem somewhere else, but that is what I came up with to fix the error.

The spec I've been looking at says

A header must have at least one entry. A header must not be followed by more than 256 entries. If there are more entries, a new header is emitted.

Which I took to mean that the directory shouldn't be empty, but looking at it at face value it says if a header exists then it must have at least one entry. Where the case under discussion here is if the header does not exist.

A check for "A header must have at least one entry" would look something like this I think

diff --git a/filesystem/squashfs/directory.go b/filesystem/squashfs/directory.go
index b559af5..1e8628f 100644
--- a/filesystem/squashfs/directory.go
+++ b/filesystem/squashfs/directory.go
@@ -60,6 +60,9 @@ func parseDirectory(b []byte) (*directory, error) {
        if err != nil {
            return nil, fmt.Errorf("could not parse directory header: %v", err)
        }
+       if directoryHeader.count == 0 {
+           return nil, fmt.Errorf("corrupted directory, must have at least one entry")
+       }
        if directoryHeader.count > maxDirEntries {
            return nil, fmt.Errorf("corrupted directory, had %d entries instead of max %d", directoryHeader.count, maxDirEntries)
        }

However that is almost pointless since directoryHeader.count is a uint32 that gets 1 added to it here but I guess it checks something useful so I could add it to the patch if you want.

https://github.com/diskfs/go-diskfs/blob/e5b494094885d06b3a1ef8fd9334ed62480aa376/filesystem/squashfs/directory.go#L147

I think the directoryHeader being missing is the only way to encode an empty directory otherwise (which squashfs clearly can as the /a/b/c/d/ directory unpacks as an empty directory) - what do you think?

deitch commented 8 months ago

I think that the lines of code I deleted 57-60 are effectively repeated at 63-66 when pos == 0 so this bit of code is just a check to see whether there is a header or not.

Rereading the code, I see that makes sense. Our only code after the (deleted) check is looping through the dir entries, for pos := 0; pos+dirHeaderSize < len(b); but if len(b) is less than 12, it will have no loops, and then just return. So this works.

So, yeah, I think you are right. Let's say, "this is the fault of unclear spec writers, we did our best, and now @ncw fixed it for us." 😄

Is this ready to go in?

ncw commented 8 months ago

So, yeah, I think you are right. Let's say, "this is the fault of unclear spec writers, we did our best, and now @ncw fixed it for us." 😄

:-)

Is this ready to go in?

I've updated the patch to add the extra check in and add a test for that check. While doing that I noticed that the for loop range needed fixing too! I updated the commit message with a summary of what we learnt above (I hadn't thought it through properly until writing the above so I thought it deserved to go in the commit message!

It is ready to go now if you are happy :-)

deitch commented 8 months ago

Looks great. Thanks!