Open Ekleog opened 4 years ago
We use posix_fallocate
to create /nix/var/nix/db/reserved
so that file should use actual disk space. However, if you have snapshots, deleting it might not free up any space.
__memset_avx2_erms
being the top frame makes me think the problem accessing unaligned data? That should indeed not happen regardless of disk space.
@edolstra Hmm that's weird, I've had this problem twice, and both times it was when my btrfs was full, the first time I fixed it by allocating a bit more space to /nix/store
and the second by remounting it rw to remove a file from /nix/store/trash
, so I'd assume it has something to do with disk being full… but probably not /nix/var/nix/db/reserved
, then, I guess
@Ericson2314 Hmm… I'd also have guessed that, but upon further digging it looks like it's maybe a regular segfault due to a bad pointer instead?
(gdb) disas
Dump of assembler code for function __memset_avx2_erms:
0x00007fd2d4b93880 <+0>: vzeroupper
0x00007fd2d4b93883 <+3>: mov %rdx,%rcx
0x00007fd2d4b93886 <+6>: movzbl %sil,%eax
0x00007fd2d4b9388a <+10>: mov %rdi,%rdx
=> 0x00007fd2d4b9388d <+13>: rep stos %al,%es:(%rdi)
0x00007fd2d4b9388f <+15>: mov %rdx,%rax
0x00007fd2d4b93892 <+18>: retq
End of assembler dump.
(gdb) p $al
$1 = 0
(gdb) p $rdi
$2 = 140543001690112
(gdb) p/x $rdi
$3 = 0x7fd2b7b10000
(gdb) p $ecx
$4 = 32768
(gdb) x $rdi
0x7fd2b7b10000: Cannot access memory at address 0x7fd2b7b10000
0x7fd2b7b10000 is the start of a page. It could be an mmap'ed file. If writing to the file fails because the disk is full, the kernel could report this via a signal, though I think it should be SIGBUS rather than SIGSEGV. Note that SQLite uses mmap for db.sqlite-shm
(https://www.sqlite.org/tempfiles.html#shared_memory_files).
Comment from the man himself: https://bugzilla.mozilla.org/show_bug.cgi?id=1471041#c6
I marked this as stale due to inactivity. → More info
Describe the bug
When the /nix partition is full (on btrfs), the nix daemon core dumps (while trying to collect garbage)
Stack Trace
Hypothesis
As my btrfs partition has compression enabled, I'd guess that
/nix/var/nix/db/reserved
, that contains all-0, ends up compressed to nothing, and thus nix can't do any good removing it.Maybe making it contain random data might be better to defend against compressing filesystems?
Either way, I must say I'm surprised to see nix crashing, and not just outputting an error message, especially in the middle of sqlite… but… maybe that should be reported upstream to sqlite?
nix-env --version
outputnix-env (Nix) 2.3.2