Open hayley-leblanc opened 3 years ago
Here's another example we found that exhibits more clearly buggy behavior. Here's a zip file with two programs that reproduce it: programs.zip. fuzzer.c was generated by our fuzzer and simpler.c is a slightly simplified version of the same program.
Here's the contents of the programs in pseudocode. In the programs, buffer
is 368 bytes long.
fd0 = open("file0")
fd1 = open("file0")
write(fd1, buffer, 0xffffffadul)
fd2 = open("file0")
fd3 = open("file0")
write(fd3, buffer, 0xffffffadul)
We tried removing some of the extra open
calls but apparently they are all necessary to reproduce the bug.
After running this program on NOVA, running df
says that the NOVA instance is 100% full, and trying to, e.g, create a directory fails. On a VM with one core, the write
calls fail and say there is no space left on disk. For some reason, on more cores, the system calls all succeed but the file system is still full and we can't create new files/directories. We tried it on both 128MB and 1GB of emulated PM and it occurred on both.
Thanks!
It looks to me that since the pwrite() specifies a large count, NOVA allocates the required NVMM data pages, and hence run out of log pages, as they share the NVMM space. Is that the case? Is write/pwrite with count larger than the buffer size an undefined behavior?
Apologies for the delayed response. I believe what you describe is correct, where NOVA allocates the number of requested pages and runs out of space. It's not immediately clear to me whether calling write/pwrite with a count larger than the buffer size is undefined behavior here.
POSIX does define what should occur when a write
call attempts to exceed the size of the physical medium here: "If a write() requests that more bytes be written than there is room for (for example, the file size limit of the process or the physical end of a medium), only as many bytes as there is room for shall be written." The man page for write
basically says that writing beyond the maximum file size is implementation defined. I'm not sure whether the POSIX spec is referring to providing a buffer larger than the size of the storage device, or a greater count value, or both.
I did a bit more experimentation with the example and found that file0 ends up having a size that is the same size as the provided buffer, eg. using a buffer of 15000 bytes makes file0 have size 15000. This seems potentially incorrect based on the POSIX spec combined with the fact that the NVMM device doesn't have any free space. It would make sense to me if the file just ended up being huge and taking up most/all of the NVMM, but that doesn't seem to be what's happening - what's the expected behavior for NOVA in this case? Also, deleting the file does not free up the space; I still can't, e.g. create a new file even after file0 has been deleted.
Anyway, I'd be interested in hearing your thoughts on this if you get a chance. Even if this is undefined behavior and legal by POSIX/Linux standards, it seems undesirable.
Hi,
I've encountered some weird behavior in NOVA that can result in
nova error
messages popping up. Unlike the other issues I've raise recently, this is not crash related and appears when we just run NOVA normally. We are working on a fuzzer to generate tests for our crash consistency testing framework and came across this while using the fuzzer.The root of the behavior seems to be calls to
pwrite
where thecount
parameter specifying the length is longer than the provided buffer. When we run a test with such apwrite
call, we usually see something like the following in dmesg after running the program:It's manifested in a couple different ways, some of which cause errors at unmount time or in subsequent system calls. Here's an example workload where I've been able to get it to manifest reliably on a 1-core VM. I am running this example on an empty instance of NOVA.
If I unmount the system immediately after running this, I see
nova error
appear on the terminal and get the following messages in dmesg:Remounting the system seems to go fine and
foo
is present with the expected contents, but we've noticed in the fuzzer that sometimes it seems to cause other issues, like not being able to create a directory following thepwrite
call. We're working on finding some more examples that exhibit more clearly buggy behavior and I'll add them when we do.Thanks!