cmu-db / peloton

The Self-Driving Database Management System
http://pelotondb.io
Apache License 2.0
2.03k stars 622 forks source link

TPC-C segfault: Double-free in GCManager #1452

Closed mbutrovich closed 6 years ago

mbutrovich commented 6 years ago

I've been working on performance stuff today, and the faster I get the system to run the less stable it is. So I decided to chase down the segfaults in TPC-C. Here are the results of one segfault hunt.

It's a double-free in GCManager's CheckAndReclaimVarlenColumns(). I've not looked at this code closely before as part of our other GC fixes, but will dive into it in the next couple of days. AddressSanitizer log is attached. asan.txt config here: tpcc_config.txt

lmwnshn commented 6 years ago

This is happening on the nightly run too. Attached are the backtraces.

gdb.txt asan.txt

mbutrovich commented 6 years ago

It's looking like this issue started manifesting after #1415 was merged 2 days ago. That commit is effectively current master, and if I go back to the commit before #1415 I can't get it to reproduce.

I don't fully understand how #1415 would introduce this since there were no changes to the GC system, but it's ~5000 lines of changes that I didn't review. It's possible that's now exposing incorrect behavior elsewhere in the system, but I have no root cause yet.

A workaround for the double-free is fairly easy to implement in ephemeral_pool.h, but I'd rather understand why ResetTuple is getting called on the same tuple multiple times since that sounds very wrong. Unfortunately, I don't have a succinct test to reproduce it yet and have to resort to running TPC-C.

tli2 commented 6 years ago

@ksaito7

ksaito7 commented 6 years ago

I check it.

ksaito7 commented 6 years ago

I found this cause. Even if a query updates data in a column with no constraint, my code updates it as if its column has primary key constraint. I didn't figure out why GC manager made that error in this situation actually. But when I fixed that, then that error disappeared.

mbutrovich commented 6 years ago

Wow, nice job finding it so quickly!

If you open up a new PR for your refactor I’ll be sure to run it through the benchmarks as a sanity check.

ksaito7 commented 6 years ago

@mbutrovich Thanks, but I've asked @lmwnshn for the test.

lmwnshn commented 6 years ago

I've run it and it looks good, more details in the PR.

ksaito7 commented 6 years ago

@lmwnshn Thanks a lot!