NixOS / nix

Nix, the purely functional package manager
https://nixos.org/
GNU Lesser General Public License v2.1
12.75k stars 1.52k forks source link

Nix daemon with /nix on full btrfs partition with compression enabled core dumps somewhere in sqlite #3808

Open Ekleog opened 4 years ago

Ekleog commented 4 years ago

Describe the bug

When the /nix partition is full (on btrfs), the nix daemon core dumps (while trying to collect garbage)

Stack Trace

#0  0x00007fd2d4b9388d in __memset_avx2_erms () from /nix/store/6m2k8kx8h216jlx9dg3lp4m90bz05yck-glibc-2.30/lib/libc.so.6
#1  0x00007fd2d46828e3 in walIndexAppend () from /nix/store/vrzijaqxp5x9zx1wfavzl2c2zfwr863p-sqlite-3.30.1/lib/libsqlite3.so.0
#2  0x00007fd2d4682e5e in walIndexReadHdr () from /nix/store/vrzijaqxp5x9zx1wfavzl2c2zfwr863p-sqlite-3.30.1/lib/libsqlite3.so.0
#3  0x00007fd2d468324b in walTryBeginRead () from /nix/store/vrzijaqxp5x9zx1wfavzl2c2zfwr863p-sqlite-3.30.1/lib/libsqlite3.so.0
#4  0x00007fd2d4695a22 in sqlite3PagerSharedLock () from /nix/store/vrzijaqxp5x9zx1wfavzl2c2zfwr863p-sqlite-3.30.1/lib/libsqlite3.so.0
#5  0x00007fd2d4696598 in sqlite3BtreeBeginTrans () from /nix/store/vrzijaqxp5x9zx1wfavzl2c2zfwr863p-sqlite-3.30.1/lib/libsqlite3.so.0
#6  0x00007fd2d46c9cc1 in sqlite3InitOne () from /nix/store/vrzijaqxp5x9zx1wfavzl2c2zfwr863p-sqlite-3.30.1/lib/libsqlite3.so.0
#7  0x00007fd2d46c9e8c in sqlite3Init () from /nix/store/vrzijaqxp5x9zx1wfavzl2c2zfwr863p-sqlite-3.30.1/lib/libsqlite3.so.0
#8  0x00007fd2d46c9ecf in sqlite3ReadSchema () from /nix/store/vrzijaqxp5x9zx1wfavzl2c2zfwr863p-sqlite-3.30.1/lib/libsqlite3.so.0
#9  0x00007fd2d46d69dd in sqlite3Pragma () from /nix/store/vrzijaqxp5x9zx1wfavzl2c2zfwr863p-sqlite-3.30.1/lib/libsqlite3.so.0
#10 0x00007fd2d46d9733 in sqlite3RunParser () from /nix/store/vrzijaqxp5x9zx1wfavzl2c2zfwr863p-sqlite-3.30.1/lib/libsqlite3.so.0
#11 0x00007fd2d46de047 in sqlite3Prepare () from /nix/store/vrzijaqxp5x9zx1wfavzl2c2zfwr863p-sqlite-3.30.1/lib/libsqlite3.so.0
#12 0x00007fd2d46de59f in sqlite3LockAndPrepare () from /nix/store/vrzijaqxp5x9zx1wfavzl2c2zfwr863p-sqlite-3.30.1/lib/libsqlite3.so.0
#13 0x00007fd2d46de946 in sqlite3_prepare_v2 () from /nix/store/vrzijaqxp5x9zx1wfavzl2c2zfwr863p-sqlite-3.30.1/lib/libsqlite3.so.0
#14 0x00007fd2d46c935c in sqlite3_exec () from /nix/store/vrzijaqxp5x9zx1wfavzl2c2zfwr863p-sqlite-3.30.1/lib/libsqlite3.so.0
#15 0x00007fd2d51dcbdc in nix::SQLite::exec(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)::{lambda()#1}::operator()() const [clone .isra.0] ()
   from /nix/store/zglc8iw1bgd116i4j6z6zrk8yd68xqvh-nix-2.3.2/lib/libnixstore.so
#16 0x00007fd2d51dc6a7 in nix::SQLite::exec(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
   from /nix/store/zglc8iw1bgd116i4j6z6zrk8yd68xqvh-nix-2.3.2/lib/libnixstore.so
#17 0x00007fd2d5189d30 in nix::LocalStore::openDB(nix::LocalStore::State&, bool) () from /nix/store/zglc8iw1bgd116i4j6z6zrk8yd68xqvh-nix-2.3.2/lib/libnixstore.so
#18 0x00007fd2d5194586 in nix::LocalStore::LocalStore(std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&) ()
   from /nix/store/zglc8iw1bgd116i4j6z6zrk8yd68xqvh-nix-2.3.2/lib/libnixstore.so
#19 0x00007fd2d51e79a7 in std::_Function_handler<std::shared_ptr<nix::Store> (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&), nix::{lambda(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&)#2}>::_M_invoke(std::_Any_data const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&) ()
   from /nix/store/zglc8iw1bgd116i4j6z6zrk8yd68xqvh-nix-2.3.2/lib/libnixstore.so
#20 0x00007fd2d51ed6cd in nix::openStore(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > > const&) ()
   from /nix/store/zglc8iw1bgd116i4j6z6zrk8yd68xqvh-nix-2.3.2/lib/libnixstore.so
#21 0x000000000048aac2 in processConnection(bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int) ()
#22 0x000000000048d3e5 in daemonLoop(char**)::{lambda()#1}::operator()() const ()
#23 0x000000000048d40c in std::_Function_handler<void (), daemonLoop(char**)::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
#24 0x00007fd2d4fdc45c in std::_Function_handler<void (), nix::startProcess(std::function<void ()>, nix::ProcessOptions const&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) ()
   from /nix/store/zglc8iw1bgd116i4j6z6zrk8yd68xqvh-nix-2.3.2/lib/libnixutil.so
#25 0x00007fd2d4fd9739 in nix::doFork(bool, std::function<void ()>) () from /nix/store/zglc8iw1bgd116i4j6z6zrk8yd68xqvh-nix-2.3.2/lib/libnixutil.so
#26 0x00007fd2d4fdc3b9 in nix::startProcess(std::function<void ()>, nix::ProcessOptions const&) () from /nix/store/zglc8iw1bgd116i4j6z6zrk8yd68xqvh-nix-2.3.2/lib/libnixutil.so
#27 0x000000000048c5e6 in _main(int, char**) ()
#28 0x00000000004ef87a in nix::mainWrapped(int, char**) ()
#29 0x00007fd2d527f0e2 in nix::handleExceptions(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::function<void ()>) ()
   from /nix/store/zglc8iw1bgd116i4j6z6zrk8yd68xqvh-nix-2.3.2/lib/libnixmain.so
#30 0x0000000000447464 in main ()

Hypothesis

As my btrfs partition has compression enabled, I'd guess that /nix/var/nix/db/reserved, that contains all-0, ends up compressed to nothing, and thus nix can't do any good removing it.

Maybe making it contain random data might be better to defend against compressing filesystems?

Either way, I must say I'm surprised to see nix crashing, and not just outputting an error message, especially in the middle of sqlite… but… maybe that should be reported upstream to sqlite?

nix-env --version output

nix-env (Nix) 2.3.2

edolstra commented 4 years ago

We use posix_fallocate to create /nix/var/nix/db/reserved so that file should use actual disk space. However, if you have snapshots, deleting it might not free up any space.

Ericson2314 commented 4 years ago

__memset_avx2_erms being the top frame makes me think the problem accessing unaligned data? That should indeed not happen regardless of disk space.

Ekleog commented 4 years ago

@edolstra Hmm that's weird, I've had this problem twice, and both times it was when my btrfs was full, the first time I fixed it by allocating a bit more space to /nix/store and the second by remounting it rw to remove a file from /nix/store/trash, so I'd assume it has something to do with disk being full… but probably not /nix/var/nix/db/reserved, then, I guess

@Ericson2314 Hmm… I'd also have guessed that, but upon further digging it looks like it's maybe a regular segfault due to a bad pointer instead?

(gdb) disas
Dump of assembler code for function __memset_avx2_erms:
   0x00007fd2d4b93880 <+0>: vzeroupper
   0x00007fd2d4b93883 <+3>: mov    %rdx,%rcx
   0x00007fd2d4b93886 <+6>: movzbl %sil,%eax
   0x00007fd2d4b9388a <+10>:    mov    %rdi,%rdx
=> 0x00007fd2d4b9388d <+13>:    rep stos %al,%es:(%rdi)
   0x00007fd2d4b9388f <+15>:    mov    %rdx,%rax
   0x00007fd2d4b93892 <+18>:    retq
End of assembler dump.
(gdb) p $al
$1 = 0
(gdb) p $rdi
$2 = 140543001690112
(gdb) p/x $rdi
$3 = 0x7fd2b7b10000
(gdb) p $ecx
$4 = 32768
(gdb) x $rdi
0x7fd2b7b10000: Cannot access memory at address 0x7fd2b7b10000
edolstra commented 4 years ago

0x7fd2b7b10000 is the start of a page. It could be an mmap'ed file. If writing to the file fails because the disk is full, the kernel could report this via a signal, though I think it should be SIGBUS rather than SIGSEGV. Note that SQLite uses mmap for db.sqlite-shm (https://www.sqlite.org/tempfiles.html#shared_memory_files).

edolstra commented 4 years ago

Comment from the man himself: https://bugzilla.mozilla.org/show_bug.cgi?id=1471041#c6

stale[bot] commented 3 years ago

I marked this as stale due to inactivity. → More info