diskfs / go-diskfs

MIT License
515 stars 116 forks source link

added test for file tree creation with large files #258

Closed mrbojangles3 closed 1 month ago

mrbojangles3 commented 1 month ago

This is the PR alluded to in #256. I have added a test case and some supporting functions to trigger an issue I am seeing on my system. TestCreateFileTree is the case I am adding. This creates a 6GiB fat32 file system, so hopefully the CI doesn't fall over because of it.

This is a great project, and you have been very responsive. Thank you.

lblyth@host:~/repos/go-diskfs/filesystem/fat32$ go test                                                             
--- FAIL: TestCreateFileTree (8.11s)                                                                                 
panic: could not read directory entries for /b/sub50/blob [recovered]
        panic: could not read directory entries for /b/sub50/blob

goroutine 263 [running]:                                                                                             
testing.tRunner.func1.2({0x6b7a80, 0xc00002e190})
        /home/lblyth/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.0.linux-amd64/src/testing/testing.go:1632 +0x230
testing.tRunner.func1()
        /home/lblyth/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.0.linux-amd64/src/testing/testing.go:1635 +0x35e
panic({0x6b7a80?, 0xc00002e190?})                                                                                    
        /home/lblyth/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.0.linux-amd64/src/runtime/panic.go:785 +0x132
github.com/diskfs/go-diskfs/filesystem/fat32_test.mkGigFile({0x778b38?, 0xc0000f6630?}, {0x70d9bd?, 0x721bcd?})
        /home/lblyth/repos/go-diskfs/filesystem/fat32/fat32_test.go:1134 +0xa5
github.com/diskfs/go-diskfs/filesystem/fat32_test.TestCreateFileTree(0xc0000dc340?)
        /home/lblyth/repos/go-diskfs/filesystem/fat32/fat32_test.go:1162 +0x328
testing.tRunner(0xc0000dc340, 0x7227e0)
        /home/lblyth/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.0.linux-amd64/src/testing/testing.go:1690 +0xf4
created by testing.(*T).Run in goroutine 1                                                                           
        /home/lblyth/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.23.0.linux-amd64/src/testing/testing.go:1743 +0x390
exit status 2
FAIL    github.com/diskfs/go-diskfs/filesystem/fat32    8.574s  
deitch commented 1 month ago

Heh, it didn't even get to your test, because listing failed. Please check that.

deitch commented 1 month ago

Lint still is complaining. Run make lint locally?

mrbojangles3 commented 1 month ago

Well, the thing is, I have been. But I am getting lint errors in code I didn't touch (in the standard library) and the errors I am seeing here aren't the ones I have had show up in my local runs of make lint. I am on x86 running fedora.

deitch commented 1 month ago

What go version are you using?

deitch commented 1 month ago

Just 3 lint errors left.

mrbojangles3 commented 1 month ago

go version go1.23.0 linux/amd64

deitch commented 1 month ago

So that should do it. Maybe you have a different version of golangci-lint. Either way, you can see the last 3 lint errors here

deitch commented 1 month ago

Lint is all clean. Huzzah!

deitch commented 1 month ago

Is the error what you expected?

mrbojangles3 commented 1 month ago

I am glad I got my local env matching the CI so this will hopefully NEVER happen again.

This is the "right" error.

deitch commented 1 month ago

"never" is such a strong word. 😆

Can you squash the commits to a single commit?

Do you understand the cause of the error that this test triggers?

mrbojangles3 commented 1 month ago

"never" is such a strong word. 😆

lol, yep

Squashed!

I do not understand what is causing this error. It was odd to see it pop up in our env. Based on our use case (making a boot-able image with two partitions) it only happened with the big files.

deitch commented 1 month ago

Well, with an actual test case, we can try and solve it now.

mrbojangles3 commented 1 month ago

I am looking forward to learning more about FAT32 because of this.

deitch commented 1 month ago

Do you know why the tests take so long now? On master branch, it takes about 1s to run all of the fat32 tests. With this branch, about a minute.

deitch commented 1 month ago

Do you mind rebasing? I fixed some of the missing error reporting. Does not fix your issue, but does make the errors easier to see.

mrbojangles3 commented 1 month ago

I am not sure why the test time ballooned. I am making a 6GiB file so that might take time depending on whats happening on the local machine.

Good suggestion on the temp file. I assume there will be more changes so if its okay, I will hold off on the squashes until the end.

deitch commented 1 month ago

Better, thanks. Looking at it. If you can squash the commits into a single one, rather than having the merge-from-master commit, that would help.

deitch commented 1 month ago

Sure, we can wait for that.

deitch commented 1 month ago

OK, I have some practical information. Up until you create the first gigfile, everything is fine. I checked the state of the file-tree.img file, and it looks good. For example, here are the starting bytes for cluster 2 (fat32 root):

00000000: 4e4f 204e 414d 4520 2020 2008 0000 8164  NO NAME    ....d
00000010: 3a59 3a59 0000 8164 3a59 0000 0000 0000  :Y:Y...d:Y......
00000020: 4120 2020 2020 2020 2020 2010 0000 8164  A          ....d
00000030: 3a59 3a59 0000 8164 3a59 0300 0000 0000  :Y:Y...d:Y......
00000040: 4162 0000 00ff ffff ffff ff0f 00c0 ffff  Ab..............
00000050: ffff ffff ffff ffff ffff 0000 ffff ffff  ................
00000060: 4220 2020 2020 2020 2020 2010 0000 8164  B          ....d
00000070: 3a59 3a59 0000 8164 3a59 0400 0000 0000  :Y:Y...d:Y......
00000080: 4172 006f 006f 0074 0066 000f 001a 6900  Ar.o.o.t.f....i.
00000090: 6c00 6500 0000 ffff ffff 0000 ffff ffff  l.e.............
000000a0: 524f 4f54 4649 4c45 2020 2000 0000 8164  ROOTFILE   ....d
000000b0: 3a59 3a59 0000 8164 3a59 0500 0b00 0000  :Y:Y...d:Y......
000000c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000d0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000e0: 0000 0000 0000 0000 0000 0000 0000 0000  ................

After the first gigfile:

00000000: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000010: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000020: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000030: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000040: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000050: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000060: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000070: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000080: 0000 0000 0000 0000 0000 0000 0000 0000  ................
00000090: 0000 0000 0000 0000 0000 0000 0000 0000  ................

Clearly, something there is wiping them clean. My first suspicion is that the calculations break down somewhere at a certain size. That should be something we can test.

deitch commented 1 month ago

Next data point. If I skip your whole tree, and just make the gigfile, it works fine. So it is some combination of the two.

mrbojangles3 commented 1 month ago

Can I ask a methodology question? - Are you using delve, breakpoints, then hexdumping the image?

deitch commented 1 month ago

Exactly that. Although hexdumping in a separate window so I can work in parallel.

deitch commented 1 month ago

It is an overflow. We use uint32 for the read offset here, and for the write offset here, but it can be up to 64 bits (filesystem offset calls on go usually take int64). Either way, I managed to recreate it where it gets to a position where it overflows, then starts overwriting back from the beginning.

PR coming soon.

deitch commented 1 month ago

See #260 .

And I incorporated your tests in the. I rewrote them a bit to fit with the style, but they go right on in.

mrbojangles3 commented 1 month ago

Great find! I admit, I wouldn't have suspected that overflow. Since fat32 came about in a time when there wasn't uint64. I figured all the offsets would be okay.

deitch commented 1 month ago

Yeah, but filesystem code does all sorts of things with individual bytes.

Can we close this now?

mrbojangles3 commented 1 month ago

Yeah, but filesystem code does all sorts of things with individual bytes.

Can we close this now?

Would you be willing to tag this version ?

deitch commented 1 month ago

Sure.