bstarynk / helpcovid

a C++ free software web application (GPLv3+, Linux) to organize people helping other neighbours in Covid pandemics
GNU General Public License v3.0
2 stars 2 forks source link

SIGTERM is not handled properly #35

Open bstarynk opened 4 years ago

bstarynk commented 4 years ago

In commit f5a5e814be41922173f3a77fab8c6 the SIGTERM signal is not handled as it should be by signalfd(2) facilities of C++ file hcv_background.cc

To exercise the bug after suitable configuration, run

./helpcovid --clear-database -D -T2 --write-pid=/tmp/helpcovid.pid

then (in some other terminal) kill $(cat /tmp/helpcovid.pid)

bstarynk commented 4 years ago

It probably is related to signal(7) and multi-threading (see pthreads(7), nptl(7) and clone(2) ...), a topic known to be delicate.

An explanation or workaround could be inspired by Qt5 and Unix signals; see also of course signal-safety(7) and sigevent(7) ....

bstarynk commented 4 years ago

Asked https://stackoverflow.com/q/61370635/841108 still buggy in commit b616defc5e54ba86980f4d9031

bstarynk commented 4 years ago

Still buggy in commit 2ae9549245a5ac67055 using https://stackoverflow.com/a/61374592/841108

bstarynk commented 4 years ago

Perhaps we need to use pipes.

See https://doc.qt.io/qt-5/unix-signals.html for an explanation or inspiration and pipe(7) and signal-safety(7).

bstarynk commented 4 years ago

Partly fixed in commit 400aa4a5cb3e4b557fca3d63c7fc1 thanks to https://stackoverflow.com/a/61374592/841108

bstarynk commented 4 years ago

In commit cb8aabd2dde705819acf86d7e we still have this reproducible issue. It seems related to httplib.

/home/basile/helpcovid/helpcovid[3018484]: ΔBG!hcv_database.cc:560▪ 02.79 s‣  hcv_initialize_database: fullreqbuf=DO $$BEGIN RAISE LOG 'starting HelpCovid git cb8aabd2dde705819ac+ (built Sat 25 Apr 2020 12:11:47 PM MEST, md5 76d2c78eea7c95996904...) cleared on rimski pid 3018484'; END;$$;
/home/basile/helpcovid/helpcovid[3018484]: hcv_database.cc:568 -  !! hcv_initialize_database got PostGreSQL version PostgreSQL 12.2 (Debian 12.2-4) on x86_64-pc-linux-gnu, compiled by gcc (Debian 9.3.0-8) 9.3.0, 64-bit(server version 12.2 (Debian 12.2-4))
/home/basile/helpcovid/helpcovid[3018484]: ΔBG!hcv_database.cc:353▪ 02.85 s‣  sql_register_helpcovid_instance starting
/home/basile/helpcovid/helpcovid[3018484]: ΔBG!hcv_database.cc:416▪ 02.87 s‣  sql_register_helpcovid_instance before insertion nowt=1587809561, timelong=1587809507, myuid=12752, myefuid=12752, mygid=4200, myefgid=4200, curexepath=/home/basile/helpcovid/helpcovid, cwdpath=/home/basile/helpcovid
/home/basile/helpcovid/helpcovid[3018484]: ΔBG!hcv_database.cc:464▪ 02.87 s‣  sql_register_helpcovid_instance osqlstr::

INSERT INTO tb_helpcovidinstance
    (hcvinst_host, hcvinst_pid, hcvinst_execelf, 
            hcvinst_startime, hcvinst_buildtime,
            hcvinst_cwd, hcvinst_topdir,
        hcvinst_gitid, hcvinst_lastgitcommit,
        hcvinst_md5sum,
            hcvinst_linuxuid, hcvinst_linuxeuid,
            hcvinst_linuxuser, hcvinst_linuxeffuser,
            hcvinst_linuxgid, hcvinst_linuxegid,
        hcvinst_compiler_version)

 VALUES ('rimski', 3018484, '/home/basile/helpcovid/helpcovid',
 ---  @hcv_database.cc:444
 to_timestamp(1587809561),  to_timestamp(1587809507), '/home/basile/helpcovid', '/home/basile/helpcovid', -- @! hcv_database.cc:448
 'cb8aabd2dde705819acf86d7e36bd34bac59e76a+',
 'cb8aabd2dde7 testing nbs in hcv_initialize_templates',
 '76d2c78eea7c95996904290e893a15cb',
 ---  @hcv_database.cc:452
12752, 12752, '', '',
 ---  @hcv_database.cc:457
4200, 4200, 'g++ (Debian 9.3.0-10) 9.3.0' )

!!!end osqlstr
/home/basile/helpcovid/helpcovid[3018484]: hcv_database.cc:471 -  !! sql_register_helpcovid_instance completed, serial#1
/home/basile/helpcovid/helpcovid[3018484]: ΔBG!hcv_database.cc:588▪ 02.87 s‣  hcv_initialize_database before preparing statements in dbname=helpcovid_db user=helpcovid_usr password=passwd1234 hostaddr=127.0.0.1 port=5432
/home/basile/helpcovid/helpcovid[3018484]: ΔBG!hcv_database.cc:618▪ 02.87 s‣  preparing find_user_by_email_pstm
/home/basile/helpcovid/helpcovid[3018484]: ΔBG!hcv_database.cc:645▪ 02.87 s‣  Registering prepared SQL statement user_create_pstm
/home/basile/helpcovid/helpcovid[3018484]: ΔBG!hcv_database.cc:645▪ 02.87 s‣  Registering prepared SQL statement user_get_password_by_email_pstm
/home/basile/helpcovid/helpcovid[3018484]: ΔBG!hcv_plugins.cc:215▪ 02.87 s‣  hcv_initialize_plugins_for_database starting with 0 plugins
/home/basile/helpcovid/helpcovid[3018484]: hcv_database.cc:591 -  !! PostGreSQL database dbname=helpcovid_db user=helpcovid_usr password=passwd1234 hostaddr=127.0.0.1 port=5432 successfully initialized
/home/basile/helpcovid/helpcovid[3018484]: ΔBG!hcv_main.cc:1017▪ 02.87 s‣  start of hcv_initialize_curlpp
/home/basile/helpcovid/helpcovid[3018484]: hcv_main.cc:1020 -  !! initialized curlpp version libcurl/7.68.0 OpenSSL/1.1.1g zlib/1.2.11 brotli/1.0.7 libidn2/2.3.0 libpsl/0.21.0 (+libidn2/2.3.0) libssh2/1.8.0 nghttp2/1.40.0 librtmp/2.3 - see www.curlpp.org for more.

-: No such file or directory
/home/basile/helpcovid/helpcovid[3018484]: ΔBG!hcv_background.cc:237▪ 02.87 s‣  hcv_start_background_thread sigprocmask done
/home/basile/helpcovid/helpcovid[3018484]: ΔBG!hcv_background.cc:241▪ 02.87 s‣  hcv_start_background_thread hcv_bg_signal_fd=6
/home/basile/helpcovid/helpcovid[3018484]: ΔBG!hcv_background.cc:247▪ 02.87 s‣  hcv_start_background_thread hcv_bg_timer_fd=7
[New Thread 0x7ffff66b9700 (LWP 3019275)]
/home/basile/helpcovid/helpcovid[3018484]: hcv_background.cc:65 -  !! hcv_background_thread_body starting thread hcovibg3018484
/home/basile/helpcovid/helpcovid[3018484]: ΔBG!hcv_background.cc:83▪ 02.87 s‣  hcv_background_thread_body signal mask set:  #1=Hangup; #13=Broken pipe; #15=Terminated; #24=CPU time limit exceeded;
/home/basile/helpcovid/helpcovid[3018484]: ΔBG!hcv_background.cc:96▪ 02.87 s‣  hcv_background_thread_body before poll

Thread 1 "helpcovid" hit Breakpoint 3, std::thread::~thread (this=0x7fffffffdf20, __in_chrg=<optimized out>) at /usr/include/c++/9/thread:138
138       if (joinable())
(gdb) c
Continuing.
/home/basile/helpcovid/helpcovid[3018484]: ΔBG!hcv_background.cc:253▪ 05.97 s‣  hcv_start_background_thread did start hcv_bgthread of id 140737327634176
/home/basile/helpcovid/helpcovid[3018484]: hcv_web.cc:603 -  !! Starting HelpCovid web server hcv_webserver@0x5555557589f0 with hcv_weburl=http://localhost:8089/ and 2 threads and 16777216 maximal payload
/home/basile/helpcovid/helpcovid[3018484]: ΔBG!hcv_web.cc:616▪ 05.97 s‣  helpcovid HTTP webhost='localhost' webport=8089
/home/basile/helpcovid/helpcovid[3018484]: hcv_web.cc:617 -  !! weburl=http://localhost:8089/ listening on webhost=localhost webport=8089
/home/basile/helpcovid/helpcovid[3018484]: ΔBG!hcv_web.cc:649▪ 05.97 s‣  hcv_webserver_run with webport=8089 weburl= 'http://localhost:8089/'..
/home/basile/helpcovid/helpcovid[3018484]: ΔBG!hcv_plugins.cc:189▪ 05.97 s‣  hcv_initialize_plugins_for_web starting with 0 plugins
[New Thread 0x7ffff5eb8700 (LWP 3020004)]
[New Thread 0x7ffff56b7700 (LWP 3020005)]

Thread 1 "helpcovid" hit Breakpoint 3, std::thread::~thread (this=0x55555576d410, __in_chrg=<optimized out>) at /usr/include/c++/9/thread:138
138       if (joinable())
(gdb) c
Continuing.
/home/basile/helpcovid/helpcovid[3018484]: ΔBG!hcv_background.cc:120▪ 12.72 s‣  hcv_background_thread_body: after poll nbfd:1
/home/basile/helpcovid/helpcovid[3018484]: ΔBG!hcv_background.cc:140▪ 12.72 s‣  hcv_background_thread_body pollable hcv_bg_signal_fd=6
/home/basile/helpcovid/helpcovid[3018484]: ΔBG!hcv_background.cc:152▪ 12.72 s‣  hcv_background_thread_body: got signalinfo #15 from hcv_bg_signal_fd=6
/home/basile/helpcovid/helpcovid[3018484]: hcv_background.cc:156 -  !! hcv_background_thread_body got SIGTERM at 12.7174 elapsed seconds
/home/basile/helpcovid/helpcovid[3018484]: ΔBG!hcv_background.cc:277▪ 12.72 s‣  start of hcv_process_SIGTERM_signal
/home/basile/helpcovid/helpcovid[3018484]: ΔBG!hcv_web.cc:129▪ 12.72 s‣  start of hcv_stop_web
/home/basile/helpcovid/helpcovid[3018484]: hcv_web.cc:134 -  !! hcv_stop_web stopped the web service
/home/basile/helpcovid/helpcovid[3018484]: ΔBG!hcv_database.cc:721▪ 12.72 s‣  hcv_close_database start
/home/basile/helpcovid/helpcovid[3018484]: ΔBG!hcv_database.cc:725▪ 12.72 s‣  hcv_close_database dbnamestr=helpcovid_db
/home/basile/helpcovid/helpcovid[3018484]: ΔBG!hcv_database.cc:480▪ 12.72 s‣  sql_unregister_helpcovid_instance starting serial#1
/home/basile/helpcovid/helpcovid[3018484]: ΔBG!hcv_database.cc:485▪ 12.72 s‣  sql_unregister_helpcovid_instance sqlstr:
DELETE FROM tb_helpcovidinstance WHERE hcvinst_id = 1

/home/basile/helpcovid/helpcovid[3018484]: ΔBG!hcv_database.cc:488▪ 12.72 s‣  sql_unregister_helpcovid_instance deleted serial#1
/home/basile/helpcovid/helpcovid[3018484]: ΔBG!hcv_database.cc:731▪ 12.72 s‣  hcv_close_database before resetting database connection
/home/basile/helpcovid/helpcovid[3018484]: hcv_database.cc:733 -  !! closed database helpcovid_db
/home/basile/helpcovid/helpcovid[3018484]: hcv_background.cc:280 -  !! HelpCovid terminating on rimski process 3018484 built Sat 25 Apr 2020 12:11:47 PM MEST
... md5sum 76d2c78eea7c95996904290e893a15cb lastgitcommit cb8aabd2dde7 testing nbs in hcv_initialize_templates
/home/basile/helpcovid/helpcovid[3018484]: hcv_background.cc:211 -  !! hcv_background_thread_body ending thread hcovibg3018484
[Thread 0x7ffff66b9700 (LWP 3019275) exited]
[Thread 0x7ffff56b7700 (LWP 3020005) exited]
[Thread 0x7ffff5eb8700 (LWP 3020004) exited]

Thread 1 "helpcovid" hit Breakpoint 3, std::thread::~thread (this=0x555555785750, __in_chrg=<optimized out>) at /usr/include/c++/9/thread:138
138       if (joinable())
(gdb) bt
#0  std::thread::~thread() (this=0x555555785750, __in_chrg=<optimized out>) at /usr/include/c++/9/thread:138
#1  0x000055555568fd49 in std::_Destroy<std::thread>(std::thread*) (__pointer=0x555555785750) at /usr/include/c++/9/bits/stl_construct.h:98
#2  0x000055555568f2b4 in std::_Destroy_aux<false>::__destroy<std::thread*>(std::thread*, std::thread*)
    (__first=0x555555785750, __last=0x555555785760) at /usr/include/c++/9/bits/stl_construct.h:108
#3  0x000055555568e624 in std::_Destroy<std::thread*>(std::thread*, std::thread*) (__first=0x555555785750, __last=0x555555785760)
    at /usr/include/c++/9/bits/stl_construct.h:137
#4  0x000055555568d165 in std::_Destroy<std::thread*, std::thread>(std::thread*, std::thread*, std::allocator<std::thread>&)
    (__first=0x555555785750, __last=0x555555785760) at /usr/include/c++/9/bits/stl_construct.h:206
#5  0x000055555568c0b1 in std::vector<std::thread, std::allocator<std::thread> >::~vector() (this=0x55555576eb18, __in_chrg=<optimized out>)
    at /usr/include/c++/9/bits/stl_vector.h:677
#6  0x0000555555690ecc in httplib::ThreadPool::~ThreadPool() (this=0x55555576eb10, __in_chrg=<optimized out>) at httplib.h:388
#7  0x0000555555690ef4 in httplib::ThreadPool::~ThreadPool() (this=0x55555576eb10, __in_chrg=<optimized out>) at httplib.h:388
#8  0x000055555568de26 in std::default_delete<httplib::TaskQueue>::operator()(httplib::TaskQueue*) const
    (this=0x7fffffffd8e8, __ptr=0x55555576eb10) at /usr/include/c++/9/bits/unique_ptr.h:81
#9  0x000055555568c9c8 in std::unique_ptr<httplib::TaskQueue, std::default_delete<httplib::TaskQueue> >::~unique_ptr()
    (this=0x7fffffffd8e8, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/unique_ptr.h:292
#10 0x000055555568b753 in httplib::Server::listen_internal() (this=0x5555557589f0) at httplib.h:3481
#11 0x000055555568b290 in httplib::Server::listen(char const*, int, int)
    (this=0x5555557589f0, host=0x7fffffffdb60 "localhost", port=8089, socket_flags=0) at httplib.h:3126
#12 0x00005555556873ad in hcv_webserver_run() () at hcv_web.cc:847
#13 0x00005555556643ed in main(int, char**) (argc=5, argv=0x7fffffffe438) at hcv_main.cc:1212
(gdb) c
Continuing.

Thread 1 "helpcovid" hit Breakpoint 3, std::thread::~thread (this=0x555555785758, __in_chrg=<optimized out>) at /usr/include/c++/9/thread:138
138       if (joinable())
(gdb) bt
#0  std::thread::~thread() (this=0x555555785758, __in_chrg=<optimized out>) at /usr/include/c++/9/thread:138
#1  0x000055555568fd49 in std::_Destroy<std::thread>(std::thread*) (__pointer=0x555555785758) at /usr/include/c++/9/bits/stl_construct.h:98
#2  0x000055555568f2b4 in std::_Destroy_aux<false>::__destroy<std::thread*>(std::thread*, std::thread*)
    (__first=0x555555785758, __last=0x555555785760) at /usr/include/c++/9/bits/stl_construct.h:108
#3  0x000055555568e624 in std::_Destroy<std::thread*>(std::thread*, std::thread*) (__first=0x555555785750, __last=0x555555785760)
    at /usr/include/c++/9/bits/stl_construct.h:137
#4  0x000055555568d165 in std::_Destroy<std::thread*, std::thread>(std::thread*, std::thread*, std::allocator<std::thread>&)
    (__first=0x555555785750, __last=0x555555785760) at /usr/include/c++/9/bits/stl_construct.h:206
#5  0x000055555568c0b1 in std::vector<std::thread, std::allocator<std::thread> >::~vector() (this=0x55555576eb18, __in_chrg=<optimized out>)
    at /usr/include/c++/9/bits/stl_vector.h:677
#6  0x0000555555690ecc in httplib::ThreadPool::~ThreadPool() (this=0x55555576eb10, __in_chrg=<optimized out>) at httplib.h:388
#7  0x0000555555690ef4 in httplib::ThreadPool::~ThreadPool() (this=0x55555576eb10, __in_chrg=<optimized out>) at httplib.h:388
#8  0x000055555568de26 in std::default_delete<httplib::TaskQueue>::operator()(httplib::TaskQueue*) const
    (this=0x7fffffffd8e8, __ptr=0x55555576eb10) at /usr/include/c++/9/bits/unique_ptr.h:81
#9  0x000055555568c9c8 in std::unique_ptr<httplib::TaskQueue, std::default_delete<httplib::TaskQueue> >::~unique_ptr()
    (this=0x7fffffffd8e8, __in_chrg=<optimized out>) at /usr/include/c++/9/bits/unique_ptr.h:292
#10 0x000055555568b753 in httplib::Server::listen_internal() (this=0x5555557589f0) at httplib.h:3481
#11 0x000055555568b290 in httplib::Server::listen(char const*, int, int)
    (this=0x5555557589f0, host=0x7fffffffdb60 "localhost", port=8089, socket_flags=0) at httplib.h:3126
#12 0x00005555556873ad in hcv_webserver_run() () at hcv_web.cc:847
#13 0x00005555556643ed in main(int, char**) (argc=5, argv=0x7fffffffe438) at hcv_main.cc:1212
(gdb) c
Continuing.
/home/basile/helpcovid/helpcovid[3018484]: hcv_web.cc:848 -  !! end hcv_webserver_run webhost=localhost webport=8089
-: Invalid argument
/home/basile/helpcovid/helpcovid[3018484]: hcv_main.cc:1216 -  !! normal end of /home/basile/helpcovid/helpcovid

Thread 1 "helpcovid" hit Breakpoint 3, std::thread::~thread (this=0x5555556fd1a0 <hcv_bgthread>, __in_chrg=<optimized out>)
    at /usr/include/c++/9/thread:138
138       if (joinable())
(gdb) bt
#0  std::thread::~thread() (this=0x5555556fd1a0 <hcv_bgthread>, __in_chrg=<optimized out>) at /usr/include/c++/9/thread:138
#1  0x00007ffff73fce27 in __run_exit_handlers
    (status=0, listp=0x7ffff757b718 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108
#2  0x00007ffff73fcfda in __GI_exit (status=<optimized out>) at exit.c:139
#3  0x00007ffff73e5e12 in __libc_start_main (main=
    0x5555556632a7 <main(int, char**)>, argc=5, argv=0x7fffffffe438, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe428) at ../csu/libc-start.c:342
#4  0x000055555560c73a in _start ()
(gdb) c
Continuing.
terminate called without an active exception

Thread 1 "helpcovid" hit Breakpoint 1, __GI_abort () at abort.c:49
49  abort.c: No such file or directory.
(gdb) bt
#0  __GI_abort () at abort.c:49
#1  0x00007ffff779c80c in  () at /lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x00007ffff77a78f6 in  () at /lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007ffff77a7961 in  () at /lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x000055555568a359 in std::thread::~thread() (this=0x5555556fd1a0 <hcv_bgthread>, __in_chrg=<optimized out>) at /usr/include/c++/9/thread:139
#5  0x00007ffff73fce27 in __run_exit_handlers
    (status=0, listp=0x7ffff757b718 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108
#6  0x00007ffff73fcfda in __GI_exit (status=<optimized out>) at exit.c:139
#7  0x00007ffff73e5e12 in __libc_start_main (main=
    0x5555556632a7 <main(int, char**)>, argc=5, argv=0x7fffffffe438, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe428) at ../csu/libc-start.c:342
#8  0x000055555560c73a in _start ()
(gdb) c
Continuing.

Thread 1 "helpcovid" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50

I am very tempted to give up using httplib.h; for our case, it seems unreliable.

yhirose commented 4 years ago

This problem happens because you don't call join for hcv_bgthread. That's why the log shows the following:

138       if (joinable())