Closed christianbundy closed 2 years ago
I spent a few hours and compiled Node.js with debug symbols, and now I have a backtrace!
Program terminated with signal SIGSEGV, Segmentation fault.
#0 Database::Del (this=0x55e4a353f410, options=..., key=...) at ../binding.cc:399
399 return db_->Delete(options, key);
[Current thread is 1 (Thread 0x7ff265717700 (LWP 57861))]
(gdb) backtrace
#0 Database::Del (this=0x55e4a353f410, options=..., key=...) at ../binding.cc:399
#1 0x00007ff266f87512 in DelWorker::DoExecute (this=0x55e4a3550dc0) at ../binding.cc:1012
#2 0x00007ff266f85134 in BaseWorker::Execute (env=0x55e4a34b4990, data=0x55e4a3550dc0) at ../binding.cc:291
#3 0x000055e49e77fbeb in (anonymous namespace)::uvimpl::Work::DoThreadPoolWork (this=0x55e4a3544bd0) at ../src/node_api.cc:851
#4 0x000055e49e782d41 in node::ThreadPoolWork::ScheduleWork()::{lambda(uv_work_s*)#1}::operator()(uv_work_s*) const (__closure=0x0, req=0x55e4a3544c08) at ../src/threadpoolwork-inl.h:39
#5 0x000055e49e782d75 in node::ThreadPoolWork::ScheduleWork()::{lambda(uv_work_s*)#1}::_FUN(uv_work_s*) () at ../src/threadpoolwork-inl.h:40
#6 0x000055e49f77273b in uv__queue_work (w=0x55e4a3544c60) at ../deps/uv/src/threadpool.c:321
#7 0x000055e49f771e8a in worker (arg=0x0) at ../deps/uv/src/threadpool.c:122
#8 0x00007ff26e367422 in start_thread () from /usr/lib/libpthread.so.0
#9 0x00007ff26e296bf3 in clone () from /usr/lib/libc.so.6
My gdb knowledge is weak, so this is just the output of backtrace
, please let me know if there's anything else I should try!
Weird, we were apparently calling db.close.bind(db)()
instead of db.close()
?! I'm having trouble reproducing the segfault with a minimal test-case but I can verify that removing the .bind()
fixes it. :thinking:
Nope, still broken.
One of my problems is that I'm running a bunch of different instances of Leveldown, is there some way that I can add the location of the database to each printf? Here's what I have for debugging now:
diff --git a/binding.cc b/binding.cc
index e938a40..2de67d0 100644
--- a/binding.cc
+++ b/binding.cc
@@ -370,6 +370,8 @@ struct Database {
leveldb::Status Open (const leveldb::Options& options,
const char* location) {
+ printf("Level(%s)->Open\n", location);
+
return leveldb::DB::Open(options, location, &db_);
}
@@ -385,17 +387,20 @@ struct Database {
leveldb::Status Put (const leveldb::WriteOptions& options,
leveldb::Slice key,
leveldb::Slice value) {
+ printf("Level->Put\n");
return db_->Put(options, key, value);
}
leveldb::Status Get (const leveldb::ReadOptions& options,
leveldb::Slice key,
std::string& value) {
+ printf("Level->Get\n");
return db_->Get(options, key, &value);
}
leveldb::Status Del (const leveldb::WriteOptions& options,
leveldb::Slice key) {
+ printf("Level->Delete\n");
return db_->Delete(options, key);
}
@@ -840,6 +845,7 @@ struct CloseWorker final : public BaseWorker {
~CloseWorker () {}
void DoExecute () override {
+ printf("Level->Close\n");
database_->CloseDatabase();
}
};
diff --git a/deps/leveldb/leveldb-1.20/Makefile b/deps/leveldb/leveldb-1.20/Makefile
index f7cc7d7..a76bb75 100755
--- a/deps/leveldb/leveldb-1.20/Makefile
+++ b/deps/leveldb/leveldb-1.20/Makefile
@@ -7,9 +7,9 @@
# to switch between compilation modes.
# (A) Production use (optimized mode)
-OPT ?= -O2 -DNDEBUG
+# OPT ?= -O2 -DNDEBUG
# (B) Debug mode, w/ full line-level debugging symbols
-# OPT ?= -g2
+OPT ?= -g2
# (C) Profiling mode: opt, but w/debugging symbols
# OPT ?= -O2 -g2 -DNDEBUG
#-----------------------------------------------
The output:
Level->Get
Level->Close
Level->Delete
Level->Close
Level->Close
Level->Close
Level->Close
Level->Close
Level->Close
Level->Delete
Level->Close
Level->Delete
Unfortunately since I have a bunch of instances of Leveldown, I can't tell which one is misbehaving.
Maybe do it on the JS side, here and/or in _batch()
:
Along the lines of:
if (this.status !== 'open') throw new Error(`Delete after close in ${this.location}`)
Good idea!
I've added some console.log()
statements in leveldown.js
too, here's the list of operations and then the error:
open
get
del
del
del
del
del
del
del
del
del
del
close
del
/home/christianbundy/src/leveldown/leveldown.js:62
throw new Error(`Delete after close in ${this.location}`)
^
Error: Delete after close in /tmp/ssb-test-1594655235568-919/flume/query
at LevelDOWN._del (/home/christianbundy/src/leveldown/leveldown.js:62:11)
at /home/christianbundy/src/leveldown/node_modules/abstract-leveldown/abstract-leveldown.js:233:12
at /home/christianbundy/src/leveldown/node_modules/abstract-leveldown/abstract-iterator.js:33:14
at processTicksAndRejections (internal/process/task_queues.js:81:21)
.status
is a public property, so as a temporary workaround you could add that check to the responsible flume module.
Thanks! Would that work since we're using clear()
rather than del()
? My understanding is that we're calling clear()
while the database is open and then closing it before the iterator finishes.
That will work for now, because currently clear()
is not optimized yet, it all happens in JS land, and uses _del()
under the hood.
Disregard that, it's an incomplete answer that misses the point. I have no time to give a good answer, unfortunately.
No problem -- I've got a workaround that's 100% fine for my use-case, I just wanted to make sure I was being a responsible citizen and reporting the segfault upstream. Sorry I haven't been able to figure out how to do a minimal repro, I've thrown a few hours into it but my repro code never segfaults. :upside_down_face:
I'll probably take a few more stabs at a minimal repro, but otherwise I'll just leave it in the back of my head. Thanks for your help so far.
I'm also experiencing segmentation faults. Using the library in a node server app (tried v14, v15 and v16 with no difference), in an alpine (3.12) based docker container, running on Ubuntu 64-bit on a Raspberry Pi (4). The files it accesses are located in a mounted local directory. The segmentation fault seems to occur when the database is opened. My logs indicate that the fault occurs even before the first read! Tried to create a new empty folder. The folder structure get created LOG, LOCK, etc. but then it fails again. No problems whatsoever on my intel-based development machine...
Closing because it's fixed in classic-level
. On any operation it checks if the db is open and throws a JS error if not. In addition (but this was also already true for leveldown
, since 6.0.3) it waits with closing the db if a clear()
operation is in flight.
@reuzel I'm not sure about your case, sounds like a different issue (platform related). If you can reproduce it on classic-level
feel free to open a new issue there.
Hi all! Is this segfault coming from leveldown? Sorry for the quick message, just walking out the door. :runner: :dash:
Repro
Env
(Using the latest leveldown)
Our CI is showing this happening on Linux and macOS (not Windows), on Node.js 10, 12, and 14.
Core dump: