fnc12 / sqlite_orm

❤️ SQLite ORM light header only library for modern C++
GNU Affero General Public License v3.0
2.3k stars 316 forks source link

Exception in the destructor crashes the application #718

Closed IvanRibakov closed 3 years ago

IvanRibakov commented 3 years ago
terminate called after throwing an instance of 'std::system_error'
  what():  unable to close due to unfinalized statements or unfinished backups: database is locked
(gdb) bt full
#0  0x00007fd5920e17ff in raise () from /lib64/libc.so.6
No symbol table info available.
#1  0x00007fd5920cbc35 in abort () from /lib64/libc.so.6
No symbol table info available.
#2  0x00007fd53b90c09b in __gnu_cxx::__verbose_terminate_handler() [clone .cold.1] () from /lib64/libstdc++.so.6
No symbol table info available.
#3  0x00007fd53b91253c in __cxxabiv1::__terminate(void (*)()) () from /lib64/libstdc++.so.6
No symbol table info available.
#4  0x00007fd53b911559 in __cxa_call_terminate () from /lib64/libstdc++.so.6
No symbol table info available.
#5  0x00007fd53b911ed8 in __gxx_personality_v0 () from /lib64/libstdc++.so.6
No symbol table info available.
#6  0x00007fd53b674b13 in _Unwind_RaiseException_Phase2 () from /lib64/libgcc_s.so.1
No symbol table info available.
#7  0x00007fd53b675081 in _Unwind_RaiseException () from /lib64/libgcc_s.so.1
No symbol table info available.
#8  0x00007fd53b9127eb in __cxa_throw () from /lib64/libstdc++.so.6
No symbol table info available.
#9  0x00007fd53ccf8875 in sqlite_orm::internal::connection_holder::release (this=0x104fb90) at /usr/include/c++/8/system_error:151
        rc = 5
#10 0x00007fd53cda5841 in sqlite_orm::internal::connection_ref::~connection_ref (this=0x7fd4c47f6648, __in_chrg=<optimized out>)
    at .../sqlite_orm/sqlite_orm.h:6842
No locals.
#11 sqlite_orm::internal::prepared_statement_base::~prepared_statement_base (this=this@entry=0x7fd4c47f6640, __in_chrg=<optimized out>)
    at .../sqlite_orm/sqlite_orm.h:6865
No locals.
#12 0x00007fd53ce3fa6b in sqlite_orm::internal::prepared_statement_t<sqlite_orm::internal::get_all_t<zroute::ElementModel> >::~prepared_statement_t (this=0x7fd4c47f6640, __in_chrg=<optimized out>)
    at .../sqlite_orm/sqlite_orm.h:6915
No locals.
#13 sqlite_orm::internal::storage_t<sqlite_orm::internal::table_t<...> (this=0x1040a70) at .../sqlite_orm/sqlite_orm.h:10486
        statement = {<sqlite_orm::internal::prepared_statement_base> = {stmt = 0x0, con = {holder = @0x104fb90}}, t = {conditions = empty std::tuple}}
        statement = <optimized out>
#14 ... more custom app logic to follow ...

I'm using sqlite_orm v1.5 in a multi-threaded environment. When doing some load testing application crashes due to exception being thrown inside a destructor as can be seen from the backtrace above.

Is there anything that can be done to prevent the application from crashing in this case?

fnc12 commented 3 years ago

try dev branch. I know for sure that some memory leak was fixed either in 1.6 or after 1.6 in dev branch

fnc12 commented 3 years ago

@IvanRibakov can you reproduce it in dev branch or master?

IvanRibakov commented 3 years ago

@fnc12 I haven't had time to try your suggestion yet. I'm working on a project that uses sqlite_orm via Conan and modifying current build automation to source sqlite_orm from a custom location is going to take more time than I can afford at the moment.

I do expect to be able to get back to this problem soon (by next week latest I hope) so I'd like to keep this issue open for a little while longer if you don't mind.

IvanRibakov commented 3 years ago

@fnc12 I've tried latest dev branch code but still observing a crash due to exception being thrown from within a destructor.

Is there anything else that can be done to prevent terminate from being called?

fnc12 commented 3 years ago

holy shield! Can you please provide some code to reproduce this bug?

IvanRibakov commented 3 years ago

@fnc12

Code to reproduce the issue ```c++ #include #include #include #include #include struct TestModel { int id; int counter; }; using namespace sqlite_orm; int main() { int connectionTimeout = 60000; // 1 minute int threadCount = 20; int loopCount = 10000; int reportFrequency = 500; int opsPerSecPerThread = 500; auto storage = make_storage( "/Volumes/data/_toremove/crash.sqlite", make_table("TEST", make_column("ID", &TestModel::id, primary_key()), make_column("COUNTER", &TestModel::counter))); storage.sync_schema(); storage.open_forever(); storage.sync_schema(true); storage.busy_timeout(connectionTimeout); storage.remove_all(); TestModel testModel{1, 0}; storage.insert(testModel); std::vector workers; for (int i = 0; i < threadCount; i++) { workers.push_back(std::thread([&, i]() { int threadNo = i; for (int j = 0; j < loopCount; j++) { auto model = storage.get(testModel.id); model.counter++; storage.update(model); if (j % reportFrequency == 0) { std::cout << "[Thread " << threadNo << "] " << ((float)j / loopCount) * 100 << "%\n"; } std::this_thread::sleep_for(std::chrono::milliseconds((int)ceil((float)1000 / opsPerSecPerThread))); } })); } std::for_each(workers.begin(), workers.end(), [](std::thread &t) { t.join(); }); } ```

Above code essentially generates high read/write load on the DB. I didn't get the same what(): message every time (sometimes it was Disk I/O), but judging by the coredump, in all cases the common cause/part of the code is sqlite_orm::internal::connection_ref::~connection_ref + sqlite_orm::internal::connection_holder::release.

Here is sample output + coredump backtrace of one of those times when I managed to get exactly the same what() message:

Console output ``` [Thread 0] 0% ... [Thread 15] 95% [Thread 2] 95% [Thread 6] 95% [Thread 1] 95% terminate called after throwing an instance of 'std::system_error' what(): unable to close due to unfinalized statements or unfinished backups: database is locked Aborted (core dumped) real 5m35.957s user 0m6.111s sys 2m20.467s ```
Coredump backtrace ``` (gdb) bt full #0 0x00007f97007b97ff in raise () from /lib64/libc.so.6 No symbol table info available. #1 0x00007f97007a3c35 in abort () from /lib64/libc.so.6 No symbol table info available. #2 0x00007f970116f09b in __gnu_cxx::__verbose_terminate_handler() [clone .cold.1] () from /lib64/libstdc++.so.6 No symbol table info available. #3 0x00007f970117553c in __cxxabiv1::__terminate(void (*)()) () from /lib64/libstdc++.so.6 No symbol table info available. #4 0x00007f9701174559 in __cxa_call_terminate () from /lib64/libstdc++.so.6 No symbol table info available. #5 0x00007f9701174ed8 in __gxx_personality_v0 () from /lib64/libstdc++.so.6 No symbol table info available. #6 0x00007f9700b55b13 in _Unwind_RaiseException_Phase2 () from /lib64/libgcc_s.so.1 No symbol table info available. #7 0x00007f9700b56081 in _Unwind_RaiseException () from /lib64/libgcc_s.so.1 No symbol table info available. #8 0x00007f97011757eb in __cxa_throw () from /lib64/libstdc++.so.6 No symbol table info available. #9 0x00000000004037d8 in sqlite_orm::internal::connection_holder::release (this=0x969f50) at /var/tmp/sqlite_load_crash/.conan/data/sqlite_orm/1.6/_/_/package/5ab84d6acfe1f23c4fae0ab88f26e3a396351ac9/include/sqlite_orm/sqlite_orm.h:7109 rc = rc = rc = #10 sqlite_orm::internal::connection_ref::~connection_ref (this=0x7f96c5ffacf8, __in_chrg=) at /var/tmp/sqlite_load_crash/.conan/data/sqlite_orm/1.6/_/_/package/5ab84d6acfe1f23c4fae0ab88f26e3a396351ac9/include/sqlite_orm/sqlite_orm.h:7109 No locals. #11 sqlite_orm::internal::prepared_statement_base::~prepared_statement_base (this=0x7f96c5ffacf0, __in_chrg=) at /var/tmp/sqlite_load_crash/.conan/data/sqlite_orm/1.6/_/_/package/5ab84d6acfe1f23c4fae0ab88f26e3a396351ac9/include/sqlite_orm/sqlite_orm.h:7132 No locals. #12 sqlite_orm::internal::prepared_statement_t >::~prepared_statement_t (this=0x7f96c5ffacf0, __in_chrg=) at /var/tmp/sqlite_load_crash/.conan/data/sqlite_orm/1.6/_/_/package/5ab84d6acfe1f23c4fae0ab88f26e3a396351ac9/include/sqlite_orm/sqlite_orm.h:7182 No locals. #13 sqlite_orm::internal::storage_t >, sqlite_orm::internal::column_t > >::get(int) (this=) at /var/tmp/sqlite_load_crash/.conan/data/sqlite_orm/1.6/_/_/package/5ab84d6acfe1f23c4fae0ab88f26e3a396351ac9/include/sqlite_orm/sqlite_orm.h:11534 statement = { = {stmt = 0x0, con = {holder = @0x969f50}}, t = {ids = std::tuple containing = {[1] = 1}}} statement = #14 ::operator() (__closure=0x9917e8) at .../main.cpp:50 model = {id = 1, counter = 93673} j = 9984 threadNo = 19 i = 19 loopCount = @0x7fff8cb9f758: 10000 storage = @0x7fff8cb9fc30: { = {on_open = {> = {> = {}, }, = {static _M_max_size = 16, static _M_max_align = 8, _M_functor = {_M_unused = {_M_object = 0x7f97014719e0, _M_const_object = 0x7f97014719e0, _M_function_pointer = 0x7f97014719e0, _M_member_pointer = (void (std::_Undefined_class::*)(std::_Undefined_class * const)) 0x7f97014719e0, this adjustment 140286538230368}, _M_pod_data = "\340\031G\001\227\177\000\000`*G\001\227\177\000"}, _M_manager = 0x0}, _M_invoker = 0x7f9701472c80}, pragma = {_synchronous = -1, _journal_mode = -1 '\377', get_connection = {> = {}, = {static _M_max_size = 16, static _M_max_align = 8, _M_functor = { _M_unused = {_M_object = 0x969ee0, _M_const_object = 0x969ee0, _M_function_pointer = 0x969ee0, _M_member_pointer = (void (std::_Undefined_class::*)(std::_Undefined_class * const)) 0x969ee0}, _M_pod_data = "\340\236\226", '\000' }, _M_manager = 0x4069e0 >::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)>}, _M_invoker = 0x406890 >::_M_invoke(std::_Any_data const&)>}}, limit = { get_connection = {> = {}, = {static _M_max_size = 16, static _M_max_align = 8, _M_functor = {_M_unused = {_M_object = 0x969f00, _M_const_object = 0x969f00, _M_function_pointer = 0x969f00, _M_member_pointer = (void (std::_Undefined_class::*)(std::_Undefined_class * const)) 0x969f00}, _M_pod_data = "\000\237\226", '\000' }, _M_manager = 0x4069e0 >::_M_manager(std::_Any_data&, std::_Any_data const&, std::_Manager_operation)>}, _M_invoker = 0x406890 >::_M_invoke(std::_Any_data const&)>}, limits = std::map with 0 elements}, inMemory = false, isOpenedForever = true, connection = std::unique_ptr = {get() = 0x969f50}, collatingFunctions = std::map with 0 elements, cachedForeignKeysCount = 0, _busy_handler = {> = {> = {}, }, = {static _M_max_size = 16, static _M_max_align = 8, _M_functor = {_M_unused = {_M_object = 0x0, _M_const_object = 0x0, _M_function_pointer = 0x0, _M_member_pointer = NULL}, _M_pod_data = '\000' }, _M_manager = 0x0}, _M_invoker = 0x0}}, impl = {> = { = {}, }, table = { = {name = "TEST", _without_rowid = false}, static columns_count = 2, columns = std::tuple containing = {[1] = { = {name = "ID"}, member_pointer = &TestModel::id, getter = NULL, setter = NULL, constraints = std::tuple containing = {[1] = { = { asc_option = sqlite_orm::constraints::primary_key_base::order_by::unspecified}, columns = empty std::tuple}}}, [2] = { = {name = "COUNTER"}, member_pointer = &TestModel::counter, getter = NULL, setter = NULL, --Type for more, q to quit, c to continue without paging--c constraints = empty std::tuple}}}}} testModel = @0x7fff8cb9f788: {id = 1, counter = 0} reportFrequency = @0x7fff8cb9f75c: 500 opsPerSecPerThread = @0x7fff8cb9f760: 500 i = loopCount = storage = testModel = reportFrequency = opsPerSecPerThread = threadNo = j = model = #15 std::__invoke_impl > (__f=...) at /usr/include/c++/8/bits/invoke.h:60 No locals. #16 std::__invoke > (__fn=...) at /usr/include/c++/8/bits/invoke.h:95 No locals. #17 std::thread::_Invoker > >::_M_invoke<0> (this=0x9917e8) at /usr/include/c++/8/thread:244 No locals. #18 std::thread::_Invoker > >::operator() (this=0x9917e8) at /usr/include/c++/8/thread:253 No locals. #19 std::thread::_State_impl > > >::_M_run(void) (this=0x9917e0) at /usr/include/c++/8/thread:196 No locals. #20 0x00007f97011a1ba3 in execute_native_thread_routine () from /lib64/libstdc++.so.6 No symbol table info available. #21 0x00007f970168014a in start_thread () from /lib64/libpthread.so.0 No symbol table info available. #22 0x00007f970087ef23 in clone () from /lib64/libc.so.6 No symbol table info available. ```
fnc12 commented 3 years ago

the reason you get this is line storage.update(model); which performs writing. Writing can be performed only from one thread simultaneously but you run all threads in parallel. You can get the same error if you use raw libsqlite3 not sqlite_orm with the same code. How to fix it? Either make only one thread to make writing or increase busy timeout:

storage.pragma.busy_timeout(...);
IvanRibakov commented 3 years ago

You can get the same error if you use raw libsqlite3 not sqlite_orm with the same code.

I'm not concerned about the error as such, but rather about the fact that I'm unable to handle it (I would expect to be able to try-catch my way out of it) because of the way sqlite_orm is structured. Is that really the best that can be done in this case?

fnc12 commented 3 years ago

are you sure that you cannot catch it? Try to enclose storage.update(model); inside try-catch block

IvanRibakov commented 3 years ago

are you sure that you cannot catch it?

Yes

Try to enclose storage.update(model); inside try-catch block

I have tried, but as you can see from the coredump, after being raised in the connection_holder::release exception does not propagate to the application code but rather is handled by the system library code which calls terminate that application can not do anything about. As I've said in the initial issue description, this is the result of throwing exception from the destructor (and allowing it to leave unhandled). This behaviour is mentioned here as all kinds of bad things and explained in more details here.

fnc12 commented 3 years ago

Ok I see. Looks like a bad thing. Anyway what would you do if you can handle this exception?

IvanRibakov commented 3 years ago

Anyway what would you do if you can handle this exception?

Log it to begin with. We're using sqlite_orm from a REST API so returning a 5XX error code (maybe 503) to the caller would be the next step. As far as I understood from your earlier comment this is caused by concurrent writes but is not a catastrophic failure that makes DB unusable so I see no reason not to continue with application execution.

fnc12 commented 3 years ago

you can use storage.busy_handler(...); function to log it

fnc12 commented 3 years ago

or try to increase busy_timeout anyway

fnc12 commented 3 years ago

anyway I'll try to reproduce it. It is strange that exception is throw. It can be thrown from dtor only if the last connection is closed but you asked storage to open database forever then it must not happen

IvanRibakov commented 3 years ago

you can use storage.busy_handler(...); function to log it

Is there somewhere an example on customising busy_handler? Mainly interested in the handler interface and what means there are for error detection/handling.

P.S. Am I right that using custom busy_handler will disable the built-in timeout logic (configured via storage.busy_timeout) so if I want to keep both (current timeout behaviour + preventing a crash) I will need to reimplement "busy timeout" logic in the busy_handler?

fnc12 commented 3 years ago

There are no examples in the repo but this API forwards call to sqlite3_busy_handler. More info here https://sqlite.org/c3ref/busy_handler.html

IvanRibakov commented 3 years ago

Going down the custom busy_handler route raises some nasty question which make this approach look very unappealing:

Were you able to reproduce the crash locally? Wondering if you have any thoughts on the "exception inside destructor" situation aside form "Looks like a bad thing".

fnc12 commented 3 years ago

how to correctly re-implement busy_timeout logic

just set busy_timeout to value greater than time required to make storage.update in your case

how to safely throw and catch exceptions in C++ when a layer of C code is in between exception generation and exception handling

it works very easy. C layer never throws any exception but returns error code. sqlite_orm translates every case of non zero error code to exception making it correct to translate errors from C to C++.

Were you able to reproduce the crash locally?

I did not started yet. I'll do it soon

Wondering if you have any thoughts on the "exception inside destructor" situation aside form "Looks like a bad thing".

Looks like a thing to think about

IvanRibakov commented 3 years ago

how to correctly re-implement busy_timeout logic

just set busy_timeout to value greater than time required to make storage.update in your case

As far as I understood from SQLite documentation

Setting a new busy handler clears any previously set handler. Note that calling sqlite3_busy_timeout() or evaluating PRAGMA busy_timeout=N will change the busy handler and thus clear any previously set busy handler.

there can only be 1 busy_handler per connection. busy_timeout is also a busy_handler so it can not be used at the same time as a custom busy_handler. Hence the initial comment about the need to re-implement sleep/retry logic that is currently provided by the default busy_timeout handler.


how to safely throw and catch exceptions in C++ when a layer of C code is in between exception generation and exception handling

it works very easy. C layer never throws any exception but returns error code. sqlite_orm translates every case of non zero error code to exception making it correct to translate errors from C to C++.

I don't see how above explanation helps. Currently the problem comes from the part of sqlite_orm that translates every case of non zero error code to exception. I was assuming that by using custom busy_handler it will be possible to throw an exception and completely bypass parts of sqlite_orm that could trigger the crash mentioned in this issue. Now that I'm saying it, I'm not sure any more if that is a good idea since maybe sqlite_orm does some kind of resource cleanup in case of error handling which would be also skipped in this case. So if throwing exception from busy_handler is not an option, I don't see how it can help work around the crash caused by destructor exception.

IvanRibakov commented 3 years ago

@fnc12 Any thoughts? What are the odds of this problem to be fixed in the sqlite_orm code base?

fnc12 commented 3 years ago

@IvanRibakov do you have a sample code to implement this? I cannot find it

IvanRibakov commented 3 years ago

To implement what? You mean code to reproduce the issue? It's attached to this https://github.com/fnc12/sqlite_orm/issues/718#issuecomment-842211808

fnc12 commented 3 years ago

@IvanRibakov sorry I am blind

fnc12 commented 3 years ago

@IvanRibakov I've ran your code. It is very easy to reproduce your crash. It happens cause of data races. I don't understand how do you expect to handle data races cause data races are UB cases. If you want your software to act predictable then you have to escape all cases of UB. So you'd better add mutex around storage.get and storage.update calls. I did it and everything stopped crashing. Why I am sure that the reason is not located inside sqlite_orm? Cause I moved connection closing outside of d-tor and still got crashes but in different place: inside sqlite3_prepare_v2

fnc12 commented 3 years ago

@IvanRibakov are you there?

IvanRibakov commented 3 years ago

Cause I moved connection closing outside of d-tor and still got crashes but in different place: inside sqlite3_prepare_v2

Do you have a stack trace or any other example of what that other crash looked like?

IvanRibakov commented 3 years ago

@fnc12 Are YOU there?

fnc12 commented 3 years ago

@IvanRibakov sorry I was a little bit busy. I remember about your question. I'll post the code and stack trace soon

fnc12 commented 3 years ago

@IvanRibakov hey hey hey I am starting working on your issue!

fnc12 commented 3 years ago

@IvanRibakov I've fixed dtor exception throwing by moving connection closing out of dtor but it still keeps firing sqlite3 asserts.

Снимок экрана 2021-08-27 в 22 20 12

This is a stack trace you asked for.

Also you can run this code if you switch to dtor-exception branch. Sorry for delay

fnc12 commented 3 years ago

@IvanRibakov are you there?

IvanRibakov commented 3 years ago

I don't have anything to add here.

I'm off to reviewing current persistence layer architecture and looking for a way to avoid concurrent writes. Perhaps you know a mutex implementation with following properties:

?

fnc12 commented 3 years ago

Try the simplest std::mutex or std::recursive_mutex. It is slow but effective