civetweb / civetweb

Embedded C/C++ web server

Other

2.9k stars 963 forks source link

Data race during shutdown #861

Closed vaind closed 3 years ago

vaind commented 4 years ago

There's a data race on ctx->stop_flag during shutdown. Didn't see an easy fix so creating an issue instead of a PR.

WARNING: ThreadSanitizer: data race (pid=748655)
  Write of size 4 at 0x7b5c00000028 by main thread:
    #0 mg_stop src/civetweb.c:18842:17 (objectbox-http-server-test+0x26cd7f)
    #1 CivetServer::close() src/CivetServer.cpp:398:3 (objectbox-http-server-test+0x2833e8)
    #2 CivetServer::~CivetServer() src/CivetServer.cpp:329:2 (objectbox-http-server-test+0x2833e8)

  Previous read of size 4 at 0x7b5c00000028 by thread T1:
    #0 master_thread_run src/civetweb.c:18625:14 (objectbox-http-server-test+0x277935)
    #1 master_thread src/civetweb.c:18725:2 (objectbox-http-server-test+0x277935)

  Location is heap block of size 888 at 0x7b5c00000000 allocated by main thread:
    #0 calloc <null> (objectbox-http-server-test+0x7b468)
    #1 mg_calloc src/civetweb.c:1540:9 (objectbox-http-server-test+0x26d151)
    #2 mg_start2 src/civetweb.c:18973:34 (objectbox-http-server-test+0x26d151)
    #3 mg_start src/civetweb.c:19548:9 (objectbox-http-server-test+0x26d151)
    #4 CivetServer::CivetServer(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, CivetCallbacks const*, void const*) src/CivetServer.cpp:321:12 (objectbox-http-server-test+0x2830de)

  Thread T1 'civetweb-master' (tid=748660, running) created by main thread at:
    #0 pthread_create <null> (objectbox-http-server-test+0xb07ce)
    #1 mg_start_thread_with_id src/civetweb.c:6049:11 (objectbox-http-server-test+0x26e920)
    #2 mg_start2 src/civetweb.c:19482:9 (objectbox-http-server-test+0x26e920)
    #3 mg_start src/civetweb.c:19548:9 (objectbox-http-server-test+0x26e920)
    #4 CivetServer::CivetServer(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, CivetCallbacks const*, void const*) src/CivetServer.cpp:321:12 (objectbox-http-server-test+0x2830de)

SUMMARY: ThreadSanitizer: data race src/civetweb.c:18842:17 in mg_stop

bel2125 commented 4 years ago

The line numbers do not seem to match the current head version. Is it the 1.12 release or any other version?

Is the data race just one thread writes the stop flag and another one reads it - and there is no lock? For the stop flag, you don't need a lock, since the data size is small enough to be read and written atomically - and even if not it does not matter, because it is written only once from 0 to 1, and if a thread sees this "1" one loop iteration later, this is not a problem. I think ThreadSanitizer cannot see this higher level logic, and might complain on something that is not actually a problem if it is used exactly this way.

Or does it complain on some other variable ... I cannot analyze it in detail without knowing the exact symbols and version matching these line numbers.

lday0321 commented 4 years ago

I also meet the same issue when i am using prometheus exposer (civetweb 1.12)

`WARNING: ThreadSanitizer: data race (pid=21538) Write of size 4 at 0x7b5c00000028 by main thread:

0 mg_stop /proc/self/cwd/external/civetweb/src/civetweb.c:18842:17 (libexternal_Scivetweb_Sliblibcivetweb.so+0x1288f)

#1 CivetServer::close() /proc/self/cwd/external/civetweb/src/CivetServer.cpp:398:3 (libexternal_Scivetweb_Slibcivetweb.so+0x5858)
#2 CivetServer::~CivetServer() /proc/self/cwd/external/civetweb/src/CivetServer.cpp:329 (libexternal_Scivetweb_Slibcivetweb.so+0x5858)
#3 CivetServer::~CivetServer() /proc/self/cwd/external/civetweb/src/CivetServer.cpp:328 (libexternal_Scivetweb_Slibcivetweb.so+0x5858)
#4 std::__1::default_delete<CivetServer>::operator()(CivetServer*) const /usr/bin/../include/c++/v1/memory:2339:5 (libexternal_Scom_Ugithub_Ujupp0r_Uprometheus_Ucpp_Spull_Slibpull.so+0x62d4)
#5 std::__1::unique_ptr<CivetServer, std::__1::default_delete<CivetServer> >::reset(CivetServer*) /usr/bin/../include/c++/v1/memory:2652 (libexternal_Scom_Ugithub_Ujupp0r_Uprometheus_Ucpp_Spull_Slibpull.so+0x62d4)
#6 std::__1::unique_ptr<CivetServer, std::__1::default_delete<CivetServer> >::~unique_ptr() /usr/bin/../include/c++/v1/memory:2606 (libexternal_Scom_Ugithub_Ujupp0r_Uprometheus_Ucpp_Spull_Slibpull.so+0x62d4)
#7 prometheus::Exposer::~Exposer() /proc/self/cwd/external/com_github_jupp0r_prometheus_cpp/pull/src/exposer.cc:27 (libexternal_Scom_Ugithub_Ujupp0r_Uprometheus_Ucpp_Spull_Slibpull.so+0x62d4)
#8 cxa_at_exit_wrapper(void*) <null> (stats_test+0x45c4a3)

Previous read of size 4 at 0x7b5c00000028 by thread T1:

0 master_thread_run /proc/self/cwd/external/civetweb/src/civetweb.c:18625:14 (libexternal_Scivetweb_Sliblibcivetweb.so+0x17c75)

#1 master_thread /proc/self/cwd/external/civetweb/src/civetweb.c:18725 (libexternal_Scivetweb_Sliblibcivetweb.so+0x17c75)`

lday0321 commented 4 years ago

when i change to the master branch (a845979a8e5e95ca6eb4a48def43fce16d46d3c2), i still meet the problem:

WARNING: ThreadSanitizer: data race (pid=22785)
  Write of size 4 at 0x7b5c00000028 by main thread:
    #0 mg_stop /proc/self/cwd/external/civetweb/src/civetweb.c:19317:17 (libexternal_Scivetweb_Sliblibcivetweb.so+0x12d4f)
    #1 CivetServer::close() /proc/self/cwd/external/civetweb/src/CivetServer.cpp:398:3 (libexternal_Scivetweb_Slibcivetweb.so+0x5858)
    #2 CivetServer::~CivetServer() /proc/self/cwd/external/civetweb/src/CivetServer.cpp:329 (libexternal_Scivetweb_Slibcivetweb.so+0x5858)
    #3 CivetServer::~CivetServer() /proc/self/cwd/external/civetweb/src/CivetServer.cpp:328 (libexternal_Scivetweb_Slibcivetweb.so+0x5858)
    #4 std::__1::default_delete<CivetServer>::operator()(CivetServer*) const /usr/bin/../include/c++/v1/memory:2339:5 (libexternal_Scom_Ugithub_Ujupp0r_Uprometheus_Ucpp_Spull_Slibpull.so+0x62d4)
    #5 std::__1::unique_ptr<CivetServer, std::__1::default_delete<CivetServer> >::reset(CivetServer*) /usr/bin/../include/c++/v1/memory:2652 (libexternal_Scom_Ugithub_Ujupp0r_Uprometheus_Ucpp_Spull_Slibpull.so+0x62d4)
    #6 std::__1::unique_ptr<CivetServer, std::__1::default_delete<CivetServer> >::~unique_ptr() /usr/bin/../include/c++/v1/memory:2606 (libexternal_Scom_Ugithub_Ujupp0r_Uprometheus_Ucpp_Spull_Slibpull.so+0x62d4)
    #7 prometheus::Exposer::~Exposer() /proc/self/cwd/external/com_github_jupp0r_prometheus_cpp/pull/src/exposer.cc:27 (libexternal_Scom_Ugithub_Ujupp0r_Uprometheus_Ucpp_Spull_Slibpull.so+0x62d4)
    #8 cxa_at_exit_wrapper(void*) <null> (stats_test+0x45c4a3)

  Previous read of size 4 at 0x7b5c00000028 by thread T2 (mutexes: write M100):
    #0 consume_socket /proc/self/cwd/external/civetweb/src/civetweb.c:18676:49 (libexternal_Scivetweb_Sliblibcivetweb.so+0x19465)
    #1 worker_thread_run /proc/self/cwd/external/civetweb/src/civetweb.c:18822 (libexternal_Scivetweb_Sliblibcivetweb.so+0x19465)
    #2 worker_thread /proc/self/cwd/external/civetweb/src/civetweb.c:18961:2 (libexternal_Scivetweb_Sliblibcivetweb.so+0x18060)

  As if synchronized via sleep:
    #0 nanosleep <null> (stats_test+0x476f5a)
    #1 std::__1::this_thread::sleep_for(std::__1::chrono::duration<long long, std::__1::ratio<1l, 1000000000l> > const&) <null> (libc++.so.1+0x8a8cb)
    #2 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2433:10 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0x7b4ff)
    #3 void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) /proc/self/cwd/external/com_google_googletest/googletest/src/gtest.cc:2469 (libexternal_Scom_Ugoogle_Ugoogletest_Slibgtest.so+0x7b4ff)

  Location is heap block of size 872 at 0x7b5c00000000 allocated by main thread:
    #0 calloc <null> (stats_test+0x4600d2)
    #1 mg_calloc /proc/self/cwd/external/civetweb/src/civetweb.c:1540:9 (libexternal_Scivetweb_Sliblibcivetweb.so+0x1307a)
    #2 mg_start2 /proc/self/cwd/external/civetweb/src/civetweb.c:19449 (libexternal_Scivetweb_Sliblibcivetweb.so+0x1307a)
    #3 mg_start /proc/self/cwd/external/civetweb/src/civetweb.c:20025 (libexternal_Scivetweb_Sliblibcivetweb.so+0x1307a)
    #4 CivetServer::CivetServer(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >, CivetCallbacks const*, void const*) /proc/self/cwd/external/civetweb/src/CivetServer.cpp:321:12 (libexternal_Scivetweb_Slibcivetweb.so+0x55bd)
    #5 prometheus::Exposer::Exposer(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned long) /proc/self/cwd/external/com_github_jupp0r_prometheus_cpp/pull/src/exposer.cc:15:19 (libexternal_Scom_Ugithub_Ujupp0r_Uprometheus_Ucpp_Spull_Slibpull.so+0x5a55)

    ...

  Mutex M100 (0x7b5c00000030) created at:
    #0 pthread_mutex_init <null> (stats_test+0x46149a)
    #1 mg_start2 /proc/self/cwd/external/civetweb/src/civetweb.c:19476:13 (libexternal_Scivetweb_Sliblibcivetweb.so+0x1322c)
    #2 mg_start /proc/self/cwd/external/civetweb/src/civetweb.c:20025 (libexternal_Scivetweb_Sliblibcivetweb.so+0x1322c)
    #3 CivetServer::CivetServer(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >, CivetCallbacks const*, void const*) /proc/self/cwd/external/civetweb/src/CivetServer.cpp:321:12 (libexternal_Scivetweb_Slibcivetweb.so+0x55bd)
    #4 prometheus::Exposer::Exposer(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned long) /proc/self/cwd/external/com_github_jupp0r_prometheus_cpp/pull/src/exposer.cc:15:19 (libexternal_Scom_Ugithub_Ujupp0r_Uprometheus_Ucpp_Spull_Slibpull.so+0x5a55)

    ...

  Thread T2 'civetweb-worker' (tid=22788, running) created by main thread at:
    #0 pthread_create <null> (stats_test+0x461076)
    #1 mg_start_thread_with_id /proc/self/cwd/external/civetweb/src/civetweb.c:6148:11 (libexternal_Scivetweb_Sliblibcivetweb.so+0x14caa)
    #2 mg_start2 /proc/self/cwd/external/civetweb/src/civetweb.c:19962 (libexternal_Scivetweb_Sliblibcivetweb.so+0x14caa)
    #3 mg_start /proc/self/cwd/external/civetweb/src/civetweb.c:20025 (libexternal_Scivetweb_Sliblibcivetweb.so+0x14caa)
    #4 CivetServer::CivetServer(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >, CivetCallbacks const*, void const*) /proc/self/cwd/external/civetweb/src/CivetServer.cpp:321:12 (libexternal_Scivetweb_Slibcivetweb.so+0x55bd)
    #5 prometheus::Exposer::Exposer(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, unsigned long) /proc/self/cwd/external/com_github_jupp0r_prometheus_cpp/pull/src/exposer.cc:15:19 (libexternal_Scom_Ugithub_Ujupp0r_Uprometheus_Ucpp_Spull_Slibpull.so+0x5a55)

SUMMARY: ThreadSanitizer: data race /proc/self/cwd/external/civetweb/src/civetweb.c:19317:17 in mg_stop

bel2125 commented 4 years ago

So, thread sanitizer reports, this read of ctx->stop_flag (https://github.com/civetweb/civetweb/blob/a845979/src/civetweb.c#L18676)

    while ((ctx->sq_head == ctx->sq_tail) && (ctx->stop_flag == 0)) {
        pthread_cond_wait(&ctx->sq_full, &ctx->thread_mutex);
    }

has a data race to with this write (https://github.com/civetweb/civetweb/blob/a845979/src/civetweb.c#L19317)

    /* Set stop flag, so all threads know they have to exit. */
    ctx->stop_flag = 1;

I think this is exactly the "stop flag" already mentioned above. It looks similar to the "notification" pattern, but in this specific case I am quite sure it works - definitely for GCC, Clang and MSVC, x86 and ARM. Could you give me some more information what you did exactly to get these warnings - can it be reproduced with the code in the CivetWeb repository only, or is some additional code required to reproduce this? Did you encounter any real problems, or just this warning? You could also try with the ALTERNATIVE_QUEUE define set, this will use a different function.

lday0321 commented 4 years ago

after I define ALTERNATIVE_QUEUE, i still got a data race warning with differenct line no

L19317 <--> L19100

WARNING: ThreadSanitizer: data race (pid=3477)
  Write of size 4 at 0x7b5800000028 by main thread:
    #0 mg_stop /proc/self/cwd/external/civetweb/src/civetweb.c:19317:17 (libexternal_Scivetweb_Sliblibcivetweb.so+0x12def)
    #1 CivetServer::close() /proc/self/cwd/external/civetweb/src/CivetServer.cpp:398:3 (libexternal_Scivetweb_Slibcivetweb.so+0x5858)
    #2 CivetServer::~CivetServer() /proc/self/cwd/external/civetweb/src/CivetServer.cpp:329 (libexternal_Scivetweb_Slibcivetweb.so+0x5858)
    #3 CivetServer::~CivetServer() /proc/self/cwd/external/civetweb/src/CivetServer.cpp:328 (libexternal_Scivetweb_Slibcivetweb.so+0x5858)
    #4 std::__1::default_delete<CivetServer>::operator()(CivetServer*) const /usr/bin/../include/c++/v1/memory:2339:5 (libexternal_Scom_Ugithub_Ujupp0r_Uprometheus_Ucpp_Spull_Slibpull.so+0x62d4)
    #5 std::__1::unique_ptr<CivetServer, std::__1::default_delete<CivetServer> >::reset(CivetServer*) /usr/bin/../include/c++/v1/memory:2652 (libexternal_Scom_Ugithub_Ujupp0r_Uprometheus_Ucpp_Spull_Slibpull.so+0x62d4)
    #6 std::__1::unique_ptr<CivetServer, std::__1::default_delete<CivetServer> >::~unique_ptr() /usr/bin/../include/c++/v1/memory:2606 (libexternal_Scom_Ugithub_Ujupp0r_Uprometheus_Ucpp_Spull_Slibpull.so+0x62d4)
    #7 prometheus::Exposer::~Exposer() /proc/self/cwd/external/com_github_jupp0r_prometheus_cpp/pull/src/exposer.cc:27 (libexternal_Scom_Ugithub_Ujupp0r_Uprometheus_Ucpp_Spull_Slibpull.so+0x62d4)
    #8 cxa_at_exit_wrapper(void*) <null> (stats_test+0x45c4a3)

  Previous read of size 4 at 0x7b5800000028 by thread T3:
    #0 master_thread_run /proc/self/cwd/external/civetweb/src/civetweb.c:19100:14 (libexternal_Scivetweb_Sliblibcivetweb.so+0x184e5)
    #1 master_thread /proc/self/cwd/external/civetweb/src/civetweb.c:19200 (libexternal_Scivetweb_Sliblibcivetweb.so+0x184e5)

lday0321 commented 4 years ago

@bel2125 i reproduce the tsan warning in the CivetWeb repository only, by using civetweb embeded_cpp example, here is my command and output, hope it useful:

clang version 8.0.1

embeded_cpp build command:

#make
make -C ../.. clean lib WITH_CPP=1
make[1]: Entering directory '/v/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2'
rm -rf out
rm -rf libcivetweb.a
rm -rf libcivetweb.so
rm -rf libcivetweb.so.1
rm -rf libcivetweb.so.1.13.0
rm -rf civetweb
rm -f civetweb_test
clang -c -fsanitize=thread -DTHREAD_SANITIZER -DDYNAMIC_ANNOTATIONS_ENABLED=1 -O2 -Wall -Wextra -Wshadow -Wformat-security -Winit-self -Wmissing-prototypes -DLINUX -Iinclude  -DUSE_STACK_SIZE=102400 -O2 -DNDEBUG -fPIC -fPIC src/civetweb.c -o out/src/civetweb.o
clang++ -c -fsanitize=thread -DTHREAD_SANITIZER -DDYNAMIC_ANNOTATIONS_ENABLED=1 -O2 -Wall -Wextra -Wshadow -Wformat-security -Winit-self -Wmissing-prototypes -DLINUX -Iinclude  -DUSE_STACK_SIZE=102400 -O2 -DNDEBUG -fPIC -fPIC -std=c++17 -stdlib=libc++ -D_LIBCPP_ENABLE_THREAD_SAFETY_ANNOTATIONS src/CivetServer.cpp -o out/src/CivetServer.o
ar cq libcivetweb.a out/src/civetweb.o out/src/CivetServer.o
make[1]: Leaving directory '/v/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2'
cp ../../libcivetweb.a .
clang++ -o embedded_cpp -I../../include  -std=c++17 -stdlib=libc++ -D_LIBCPP_ENABLE_THREAD_SAFETY_ANNOTATIONS -fsanitize=thread -lc++ embedded_cpp.cpp libcivetweb.a -lpthread -ldl -lrt

i add the definition below in TOP_dir Makefile

#################################################
CC=clang
CXX=clang++
CFLAGS=-fsanitize=thread -DTHREAD_SANITIZER -DDYNAMIC_ANNOTATIONS_ENABLED=1 -g
CXXFLAGS=-std=c++17 -stdlib=libc++ -D_LIBCPP_ENABLE_THREAD_SAFETY_ANNOTATIONS
LDFLAGS=-fsanitize=thread -lc++ -ltsan
#################################################

also i add the definition below in examples/embedded_cpp/Makefile

#################################################
CC=clang
CXX=clang++
CFLAGS=-fsanitize=thread -DTHREAD_SANITIZER -DDYNAMIC_ANNOTATIONS_ENABLED=1 -g
#CXXFLAGS=-std=c++17 -stdlib=libc++ -D_LIBCPP_ENABLE_THREAD_SAFETY_ANNOTATIONS
LDFLAGS=-fsanitize=thread -lc++
#################################################

CFLAGS = -I$(TOP)/include $(COPT) -std=c++17 -stdlib=libc++ -D_LIBCPP_ENABLE_THREAD_SAFETY_ANNOTATIONS

after i launched embeded_cpp and use curl http://localhost:8081/exit to stop it, i get the tsan message below, same as i post above, L19317 <--> L19100

#./embedded_cpp
Browse files at http://localhost:8081/
Run example at http://localhost:8081/example
Exit at http://localhost:8081/exit
==================
WARNING: ThreadSanitizer: data race (pid=5996)
  Write of size 1 at 0x000000fe5f28 by thread T1:
    #0 ExitHandler::handleGet(CivetServer*, mg_connection*) <null> (embedded_cpp+0x4cfef0)
    #1 CivetServer::requestHandler(mg_connection*, void*) /v/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/src/CivetServer.cpp:141:20 (embedded_cpp+0x4f6220)
    #2 handle_request /v/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/src/civetweb.c:14480:8 (embedded_cpp+0x4f2945)
    #3 process_new_connection /v/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/src/civetweb.c:18509:5 (embedded_cpp+0x4ee8ba)
    #4 worker_thread_run /v/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/src/civetweb.c:18906:4 (embedded_cpp+0x4eb727)
    #5 worker_thread /v/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/src/civetweb.c:18961 (embedded_cpp+0x4eb727)

  Previous read of size 1 at 0x000000fe5f28 by main thread:
    #0 main <null> (embedded_cpp+0x4cf0c2)

  Location is global 'exitNow' of size 1 at 0x000000fe5f28 (embedded_cpp+0x000000fe5f28)

  Thread T1 'civetweb-worker' (tid=5998, running) created by main thread at:
    #0 pthread_create <null> (embedded_cpp+0x45dfe6)
    #1 mg_start_thread_with_id /v/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/src/civetweb.c:6148:11 (embedded_cpp+0x4e2848)
    #2 mg_start2 /v/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/src/civetweb.c:19962 (embedded_cpp+0x4e2848)
    #3 mg_start /v/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/src/civetweb.c:20025 (embedded_cpp+0x4e2848)
    #4 CivetServer::CivetServer(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >, CivetCallbacks const*, void const*) /v/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/src/CivetServer.cpp:321:12 (embedded_cpp+0x4f6e02)
    #5 main <null> (embedded_cpp+0x4cee65)

SUMMARY: ThreadSanitizer: data race (/v/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/examples/embedded_cpp/embedded_cpp+0x4cfef0) in ExitHandler::handleGet(CivetServer*, mg_connection*)
==================
Bye!
==================
WARNING: ThreadSanitizer: data race (pid=5996)
  Write of size 4 at 0x7b5c00000028 by main thread:
    #0 mg_stop /v/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/src/civetweb.c:19317:17 (embedded_cpp+0x4e0edb)
    #1 CivetServer::close() /v/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/src/CivetServer.cpp:398:3 (embedded_cpp+0x4f6fa4)
    #2 CivetServer::~CivetServer() /v/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/src/CivetServer.cpp:329 (embedded_cpp+0x4f6fa4)
    #3 main <null> (embedded_cpp+0x4cf299)

  Previous read of size 4 at 0x7b5c00000028 by thread T51:
    #0 master_thread_run /v/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/src/civetweb.c:19100:14 (embedded_cpp+0x4ecbd6)
    #1 master_thread /v/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/src/civetweb.c:19200 (embedded_cpp+0x4ecbd6)

  As if synchronized via sleep:
    #0 sleep <null> (embedded_cpp+0x4821f4)
    #1 main <null> (embedded_cpp+0x4cf0e2)

  Location is heap block of size 896 at 0x7b5c00000000 allocated by main thread:
    #0 calloc <null> (embedded_cpp+0x45d042)
    #1 mg_calloc /v/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/src/civetweb.c:1540:9 (embedded_cpp+0x4e127d)
    #2 mg_start2 /v/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/src/civetweb.c:19449 (embedded_cpp+0x4e127d)
    #3 mg_start /v/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/src/civetweb.c:20025 (embedded_cpp+0x4e127d)
    #4 CivetServer::CivetServer(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >, CivetCallbacks const*, void const*) /v/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/src/CivetServer.cpp:321:12 (embedded_cpp+0x4f6e02)
    #5 main <null> (embedded_cpp+0x4cee65)

  Thread T51 'civetweb-master' (tid=6048, running) created by main thread at:
    #0 pthread_create <null> (embedded_cpp+0x45dfe6)
    #1 mg_start_thread_with_id /v/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/src/civetweb.c:6148:11 (embedded_cpp+0x4e2f6a)
    #2 mg_start2 /v/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/src/civetweb.c:20008 (embedded_cpp+0x4e2f6a)
    #3 mg_start /v/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/src/civetweb.c:20025 (embedded_cpp+0x4e2f6a)
    #4 CivetServer::CivetServer(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >, CivetCallbacks const*, void const*) /v/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/src/CivetServer.cpp:321:12 (embedded_cpp+0x4f6e02)
    #5 main <null> (embedded_cpp+0x4cee65)

SUMMARY: ThreadSanitizer: data race /v/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/civetweb-a845979a8e5e95ca6eb4a48def43fce16d46d3c2/src/civetweb.c:19317:17 in mg_stop
==================
ThreadSanitizer: reported 2 warnings

Oipo commented 4 years ago

@bel2125 The issue here is that volatile is not a proper thread synchronization mechanism, volatile should only be used when the memory can be changed by external (read, hardware) sources. It does not introduce memory barriers, which are also needed for atomic operations. A more detailed explanation of why volatile is not a proper synchronization can be found on StackOverflow. ThreadSanitizer is thusly correctly saying that the stop_flag reads/writes lead to data races.

Celix recently ran into the same issues and solved it by using the __atomic functions built-in for gcc and clang.

As can be seen in the celix PR, more than the stop_flag needs to be modified. nslast and the server stats such as totalMemUsed have to be modified as well. I would be willing to write a pull request to insert atomic stores/loads where necessary, however, that would conflict with the current chosen solution of a global lock. Moreover, it seems that compiling with MSVC is desired, which would mean introducing an abstraction surrounding how to do atomic loads/stores. Although a global lock would work, it has more performance overhead than atomics.

What do you think about switching to atomics @bel2125 ?

bel2125 commented 4 years ago

While I do agree that volatile is not an appropriate sync mechanism in the general case, I still think it is suitable in this special case. ThreadSanitizer and most articles or forum posts deal with the general case, but not the special ones. The code pattern is (simplified):

volatile int exit_flag = 0;

multiple_threads() 
{
    some_init();
    while (!exit_flag) {
        do_something();
    }
    some_cleanup();
}

exit_all()
{
    exit_flag = 1;
    join_all_threads();
}

Here exit_flag is used as a small shared memory (according to 2) in https://www.geeksforgeeks.org/understanding-volatile-qualifier-in-c/). Once set to 1, it is never set back to 0. An integer should "in practice" be written atomically by all compilers/CPUs, but the C standard does not require this. In this specific case, I do not need writing (or reading) is really atomic - it would still be ok if the compiler creates something like.

exit_all()
{
    /* exit_flag = 1; in an absolutely not atomic form */
    int one = 1;
    char *src = &one;
    char *dst = &exit_flag;
    for (size_t n = 0; n<sizeof(exit_flag); n++) {
        dst[n] = src[n];
        sleep(1);
    }

    join_all_threads();
}

Using volatile provides one guarantee that we actually need: the exit_flag variable is read in every loop iteration and not optimized into something like

multiple_threads() 
{
    some_init();
    if (!exit_flag) {
        for (;;) {do_something();}
    }
    some_cleanup();
}

Finally, on a multiprocessor architecture, it could be that one of the multiple_worker threads already sees exit_flag=1 while another one still sees exit_flag=0. There is no negative consequence, if one thread does one more loop iteration - you have no guarantee how many loop iterations the threads did until you call exit_all anyway.

This is not a pattern like this, which would require ordering guarantees (your link to StackOverflow) - probably also ThreadSanitizer implicitly assumes something like this:

int value;
volatile int value_valid;

thread_1()
{
    value = 42;
    value_valid = 1;
}

thread_2()
{
    if (value_valid) {
        func(value);
    }
}

Where precisely do you think there is a problem? I'm not talking about general assumptions on what one might need in multi-threaded programming, but only on the exit_flag pattern above?

I only want to change something here, if there is a real, specific problem, because this code (checking exit_flag) is done very often. Any lock will have performance effects.

I still doubt there is a combination of compiler/os/cpu that causes an issue with the exit_flag pattern above. In case it is, it would still be possible to use the STOP_FLAG_NEEDS_LOCK define introduced in https://github.com/civetweb/civetweb/commit/486ca49c4982bf2397e7879cc22fa6fde8395cbe

With this define, a global lock is used. Using atomics instead of a global lock would be possible (and smarter) must work for GCC, Clang, MinGW, Intel-Compiler, Visual C++, vendor specific compilers of various UNIX-flavors and compilers for embedded systems, for Windows, Linux, MacOS, some *IX, some embedded operating systems, for X86, ARM, ... I don't know all uses, so this list is certainly incomplete. I did not want to invest too much time, since I'm still not convinced there is a real issue in the use of the exit_flag above. Of course I would be willing to change this, if there is a real problem.

By the way, civetweb already uses atomic functions - see mg_atomic_inc. If there is neither __sync_add_and_fetch nor InterlockedIncrement available, it uses a global lock.

Oipo commented 4 years ago

Thank you for taking the time to make an extensive reply. I'll try my best to answer your concerns.

Direct responses

Where precisely do you think there is a problem? I'm not talking about general assumptions on what one might need in multi-threaded programming, but only on the exit_flag pattern above?

Ignoring the nslast, totalMemUsed, maxMemUsed, blockCount, sq_head, sq_tail, sq_blocked cases, there are several issue with stop_flag being volatile:

Volatile does not guarantee that other cores in the system see the change. There is no guarantee that volatile does a cache flush. If it lives in non-shared L1 and L2 caches, other cores might simply not see the change until it is evicted from cache. x86 CPUs typically use MESI-derived cache coherency protocols, preventing this issue. But this definitely is an issue for embedded CPUs with software defined cache coherency such as the ARM920 and also in big.LITTLE architectures such as the ARMv8-A requiring barriers. PowerPC is another architecture with similar requirements as the ARMv8-A. Instead of barriers, what volatile does is prevent software-caching, by restricting the compiler to not hold the variable in a register for a long time, but put it into "memory" as soon as possible. Where "memory" is the entirety of caches, RAM and other similar hardware specific implementations.
Legalise-wise, the C++11 and later and the C11 and later standards specify that concurrent reads/writes from multiple threads of a single memory location is undefined behaviour unless all reads/writes use atomic operations. Volatile does not change this.
Although less so for stop_flag and more so for the aforementioned other cases, volatile disables some possible optimizations that the compiler can do.
Software quality wise, the stop_flag is incredibly brittle and difficult to maintain. Every time the usage of stop_flag is changed, careful analysis has to be done to prevent multi-threading issues, whereas using proper mechanics such as atomics or locks would inherently communicate this to the developer.

I only want to change something here, if there is a real, specific problem, because this code (checking exit_flag) is done very often. Any lock will have performance effects.

The word 'performance' is usually a red herring. What type of performance are we talking about? Throughput? Latency? Consistency? Moreover, are we comparing volatile with undefined behaviour to a lock with defined behaviour? Atomics vs locks? Volatile vs atomics?

When looking at embedded software, consistency trumps throughput in >99% of the cases. In this case, using atomics would likely increase performance over volatile, since atomics do not suffer from the memory visibility problem outlined above. I would even say that locks would have a similar advantage over volatile, especially when using software defined cache coherency CPUs, but without any concrete measurements, I cannot know for sure.

JIawei Huang made some measurements on C++ atomics vs volatile, that show the difference on his (likely x86 cpu) machine is rouhgly 2,5 nanoseconds per volatile write and 25 nanoseconds per atomic write. On my x86 machine, the difference is 4,5 nanoseconds per volatile write and 10 nanoseconds per atomic write.

In comparison, read() uses a syscall, requiring hundreds of nanoseconds. Not to mention that the kernel has to copy memory from interfaces into the buffer, civetweb itself does some memsets/memcpys and whenever SSL kicks in, even locks will look like free operations. The performance impact of the implementation of stop_flag on civetweb, regardless of using volatile, atomic or locks, is negligible.

By the way, civetweb already uses atomic functions - see mg_atomic_inc. If there is neither __sync_add_and_fetch nor InterlockedIncrement available, it uses a global lock.

Very nice! However, the most crucial part is missing. Remember that "concurrent reads/writes from multiple threads of a single memory location is undefined behaviour unless all reads/writes use atomic operations." meaning, if a write is done atomically but a read is not, you have a data race and undefined behaviour. The mg_atomic_load function is crucially missing in action and negating all efforts done with mg_atomic_inc, mg_atomic_dec, and mg_atomic_add.

But what about the other cases?

The other cases are much more severe than stop_flag. Take nslast for example. nslast is an 8 byte static variable, potentially read AND written to in multiple threads. On 32 bit architectures, the variable requires two writes to update, leading to word/memory tearing. Completely garbage results can happen (e.g. if 0xAAAAAAAA and 0xBBBBBBBB are written simultaneously, 0xAAAABBBB is a possible outcome, as are 0xBBBBAAAA, 0xAAAAAAAA and 0xBBBBBBBB). Even on 64-bit architectures, one thread can influence the other threads' nslast.

totalMemUsed, maxMemUsed, and blockCount all suffer from having atomic writes, but no atomic reads. As mentioned before, this is undefined behaviour. But in the least bad case, this results in stale reads, where the contents on one core can be observed to be old values.

The sq_head, sq_tail, and sq_blocked cases are curious, as far as I can see, most of the reads and writes occur within critical sections of a lock. So then, why are they volatile? This prevents compiler optimizations where you noted that performance (regardless of the aforementioned specifics) is important. But in the cases of usage where they are not locked, such as line 20670, this is once again UB and in the least bad case results in stale reads.

Closing words

I hope I have sufficiently motivated the preference of using atomics or locks over volatile in multithreaded applications written in C. If not, there are a myriad of sources talking about the uselessness of volatile in these situations. Please read them at your leisure.

bel2125 commented 4 years ago

Ignoring the nslast, totalMemUsed, maxMemUsed, blockCount, sq_head, sq_tail, sq_blocked cases, there are several issue with stop_flag being volatile:

We have to look at all these cases individually, otherwise this cannot be handled! Probably even create separate issues. I will reply in more detail on "stop flag" now (and only briefly on everything else).

I still think the stop_flag will not cause any issues, since we do have the guarantees we need. The use of stop_flag is also "battle tested" for more than 10 years on multiple CPU architectures and operating systems ... from small embedded systems to large servers.

However, if there is really a problem, I want to understand that in detail, and we can fix it.

Volatile does not guarantee that other cores in the system see the change. There is no guarantee that volatile does a cache flush. ...

This is not a guarantee that I really need. If the cache is flushed some milliseconds later, the system will stop some milliseconds later. Anyway, stopping the server takes a while, since it might be in the middle of processing some request, and since (graceful) TCP shutdown takes some time anyway. Stopping the server is not a real-time critical.

I did not check it again, but I have in mind a write operation will not stay in L1 cache only, so it could be only a L1 cache of a reader that might still see the old value of zero.

It would be a problem, if a change is forever (for seconds) stuck in L1 cache of one core, and the other core will never see it ... Is this the scenario you have in mind?

Legalise-wise, ... the C11 and later standards specify that concurrent reads/writes from multiple threads of a single memory location is undefined behaviour ...

I think this is the final free draft of C11: http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1570.pdf Section J.2 lists all undefined behaviours. To what undefined behavior exactly are you referring?

Non-normative note 1 in section 5.1.2.4 "Multi-threaded executions and data races" states: "In some cases, there may instead be undefined behavior.". But the guarantee we need for exit_flag is given just before: the value will be either the initial value (zero), or the value written by another thread (one, if the other thread called mg_exit). I do not need this to be "immediately" (see above), and since only one bit is set, there is no need for a particular order.

Or are you referring to another section?

Although less so for stop_flag and more so for the aforementioned other cases, volatile disables some possible optimizations that the compiler can do.

The other cases need to be discussed individually.

Software quality wise, the stop_flag is incredibly brittle and difficult to maintain. Every time the usage of stop_flag is changed, careful analysis has to be done to prevent multi-threading issues, whereas using proper mechanics such as atomics or locks would inherently communicate this to the developer.

The stop flag was in place at least 7 years ago, in the original "mongoose" code civetweb is forked from. Probably it is working like this and maintained in this form for much longer, definitely longer than I know the project. Even the first git commit "Initial import - converting from Subversion." from 2010 already had the stop_flag (I can't search the history before that). The usage of the stop_flag has always been "set to 1 to stop the server".

The mg_atomic_load function is crucially missing in action and negating all efforts done with mg_atomic_inc, mg_atomic_dec, and mg_atomic_add

atomic inc/dec/add will return the incremented/.. value, so they implicitly do an atomic load (__sync_add_and_fetch). I'm not sure if there is some point in the code where these variables are read without atomic inc/dec (except for debugging purposes) ... but this is again a different topic than "stop_flag".

But what about the other cases?

As already mentioned, they should be looked at individually. So ... without looking into details ... I can only provide some first quick estimation.

Take nslast for example.

nslast is only used in the DEBUG_TRACE_FUNC if the DEBUG define is set, and the debug/instrumentation code does not provide it's own DEBUG_TRACE implementation. This entire function is not meant for production code - all functions only available when DEBUG is set are not meant for production code. It will produce hundreds of trace lines, since every request will cause many "printf"s.

Up to now, the sole purpose of "DEBUG_TRACE_FUNC" was for testing new features or analysis of issues - usually done in an isolated environment where exactly one client performs typically one request - followed by manually reading and interpreting the debug log.

The nslast variable is only used for calculating time differences in debug printf output. These time differences are not precise anyway, since turning on DEBUG instrumentation and writing lots of printf lines will have a significant effect on timing.

We could add a library-global mutex around the entire DEBUG_TRACE_FUNC - that's probably a good idea anyway.

Even after adding a lock, I still don't recommend to anyone to use DEBUG in production.

totalMemUsed, maxMemUsed, and blockCount all suffer from having atomic writes, but no atomic reads.

These three variables are part of the optional USE_SERVER_STATS feature. The correctness of these values are not critical for the core functionality of the server (read: if there would be a bug, it would be not critical in the sense of "we need to roll out a bug fix version immediately"). Still, this statistics ismore than "DEBUG" features (so they are not labeled as "not for production" as the DEBUG_TRACE_FUNC above). In particular mg_get_context_info is a (non-cyclic) diagnosis function that should deliver reasonable values.

#if defined(USE_SERVER_STATS)
struct mg_memory_stat {
    volatile int64_t totalMemUsed;
    volatile int64_t maxMemUsed;
    volatile int blockCount;
};
...

Some function reading these values do this only inside a MEMORY_DEBUGGING define ... that's only used with "DEBUG" (see above), in memory leak testing/debugging, in cases tools like AddressSanitizer/LeakSanitizer/... reach their limit. However, since these memory diagnostic tools became quite good, I was already thinking about removing this code completely.

The function relevant for production code would be mg_get_context_info. While the C standard does not make any statement on atomic loading without __atomic_load / InterlockedOr / .., the Intel CPU manual does. A 32 bit CPU will read and write a 32 bit value atomically, a 64 bit CPU will read and write a 64 bit value atomically (provided it is properly aligned and in RAM). So x86 and x64 will work without problems (a 32 bit CPU will only use the 32 bits of the 64 bit memory-size variables, so we only need size_t). Many years ago I checked this for 32 bit ARM v5 and it was OK as well. According to the Microsoft docs "Simple reads and writes to properly-aligned 32-bit variables are atomic operations. ", so it's probably true for all CPU platforms that could run Windows.

I could test with GCC and clang if it makes a difference ...

uint64_t a, b, c, d;

void load()
{
  a = b;
  c = __atomic_load_n(&d, __ATOMIC_RELAXED); // same with __ATOMIC_ACQUIRE
}

==>

        movq    b(%rip), %rax
        movq    %rax, a(%rip)
        movq    d(%rip), %rax
        movq    %rax, c(%rip)

Are you aware of any 32 bit processor, that does not store/load properly aligned 32 bit values atomically? Or a 64 bit processor, that does not store/load properly aligned 32 bit values atomically?

From my experience, a 'n' bit processor will load/store 'n' bit atomically (from RAM) without any special instruction, and usually provides dedicated instructions to load and store 2n bits atomically (e.g., InterlockedCompareExchange128 is only available on 64 bit systems).

This is, of course, not regulated by the C standard, since it is CPU dependent.

Since gcc/clang will optimize the "atomic" away, it seems to be not required. But it also does not hurt to add it for Linux (and MacOS and ... everything except Windows). Windows (MSCV compiler) does not offer __atomic_load, but documents atomic for 32/64 bit read writes (https://docs.microsoft.com/en-us/windows/win32/sync/interlocked-variable-access).

The sq_head, sq_tail, and sq_blocked cases are curious, as far as I can see, most of the reads and writes occur within critical sections of a lock. So then, why are they volatile? This prevents compiler optimizations where you noted that performance (regardless of the aforementioned specifics) is important. But in the cases of usage where they are not locked, such as line 20670, this is once again UB and in the least bad case results in stale reads.

Again not the "stop_flag" case, but let's check it.

sq_head/tail is part of the original socket queue for pending connections, implemented >10 years ago, and was already preset in the first GitHub import: https://github.com/civetweb/civetweb/blob/a69a2da39dcca63ad1a77553d3bc3a05f42e0e66/mongoose.c#L403 It was not volatile that time - I could search where it came from ...

3583e83a Using volatile keyword in the context struct to prevent unwanted optimization. See http://code.google.com/p/mongoose/issues/detail?id=216

Since google code is no longer used, probably this is the archived discussion from 2011 https://code.google.com/archive/p/mongoose/issues/216 .. it seeems to have solved something that time, but the links are now dead. Maybe the compiler (minGW on WinXP) had some issues ... maybe a bug was fixed at the wrong place ... difficult to find out now. Better not change a running system here, without being absolutely sure.

Regarding "performance": This queue is used once if a new connection is established, while "stop_flag" are used dozens of times when a request is made. I did not count it, but checking stop_flag occurs orders of magnitude more often.

Regarding line 20670: this is again the diagnosis function mg_get_context_info - the same considerations as above

I hope I have sufficiently motivated the preference of using atomics or locks over volatile in multithreaded applications written in C.

I never had any doubt on the usefulness of atomic operations or locks - you will find them in civetweb. I just stated the use of volatile in the special case of a "stop_flag" is valid.

If not, there are a myriad of sources talking about the uselessness of volatile in these situations. Please read them at your leisure.

Some of your links don't work. Independent of your links, I already read quite some literature on this topic. I already knew https://sites.google.com/site/kjellhedstrom2/stay-away-from-volatile-in-threaded-code and https://blog.regehr.org/archives/28 - I will check the rest.

Unfortunately most of the sources are like "volatile is evil in general" + one example where it is used in a case where a lock or atomic is more appropriate (or without any rationale, like this page http://isvolatileusefulwiththreads.com/ - which is -sorry- just plain stupid). There are only very few pages with a differentiated workup. In most cases it is implicitly assumed, the code will need a certain write order, and there is no "what if the code does not need a write order" consideration. Or they assume right from the start the reader wanted something different or additional with "volatile". Or they claim it can only be used in device drivers - a strange conclusion, since the C standard does not mention "device driver" at all. However, a task asynchronously modifying a shared global is conceptually the same as an interrupt handler (cited from https://barrgroup.com/embedded-systems/how-to/c-volatile-keyword). Sometimes I had the impression the authors did not even understand what "side effect to the execution environment" in the C standard means.

You should think of a volatile variable like of a shared memory. Some operations on a shared memory require a lock (then volatile is not required, but also does not change the assembler output since lock/unlock are barriers anyway), other operations do not require a lock - then volatile is mandatory. https://stackoverflow.com/questions/3148319/is-volatile-required-for-shared-memory-accessed-via-access-function https://aticleworld.com/understanding-volatile-qualifier-in-c/ https://www.sciencedirect.com/topics/computer-science/volatile-keyword Then there is a lot of litrature on lock free programming, spin locks, ... use volatile.

xtne6f commented 4 years ago

I have created some commits for reference. ( ref-stop-volatile and ref-stop-atomic, the only difference is #ifdef condition: STOP_FLAG_NEEDS_ATOMICS vs STOP_FLAG_NO_ATOMICS )

Since google code is no longer used, probably this is the archived discussion from 2011 https://code.google.com/archive/p/mongoose/issues/216 .. it seeems to have solved something that time, but the links are now dead. Maybe the compiler (minGW on WinXP) had some issues ... maybe a bug was fixed at the wrong place ... difficult to find out now. Better not change a running system here, without being absolutely sure.

It is just a guess, but pthread_cond_wait() at the time seems to have had a critical bug (It has been fixed before v1.1). Those variables were not protected, so various strange things must have happened.

static int pthread_cond_wait(pthread_cond_t *cv, pthread_mutex_t *mutex) {
  HANDLE handles[] = {cv->signal, cv->broadcast};
  ReleaseMutex(*mutex);
  WaitForMultipleObjects(2, handles, FALSE, INFINITE);
  return ReleaseMutex(*mutex) == 0 ? -1 : 0; /** Must acquire the lock, but... **/
}

bel2125 commented 4 years ago

@xtne6f: Thank you for your commits - I will have a look

Still, I would be interested in your opinion on the stop flag topic:

Do you think this is required in reality (and fixes ... what problem in the server)
Or is it just something purely theoretical ("it is in theory possible to build a CPU that might have an issue")
Or rather just a false positive from address sanitizer, that does not understand this is particular case

bel2125 commented 4 years ago

Did I get this correctly: The difference between ref-stop-atomic and ref-stop-volatile is the define name (STOP_FLAG_NO_ATOMICS vs. STOP_FLAG_NEEDS_ATOMICS) and the default behavior if no define is set?

xtne6f commented 4 years ago

Yes, it is only the difference of the default.

Actually, I also think volatile is enough for stop_flag. As mentioned above, C11 introduced a strict memory model, and it now provides strictly defined atomic functions. But, unfortunately, it is only C11 and above. We have to use some compiler/platform dependent atomics yet. It is simply troublesome. After all, it is mere a stop flag.

On the other hand, I think what @Oipo mentions is correct. His comments are very interesting. Moreover, modern compilers (even MSVC) are becoming more obedient to ISO-standard.

In conclusion, I think volatile is enough by default for now. Though, several years later, I might change my mind.

bel2125 commented 4 years ago

Regarding the three changes not related to stop_flag:

Let's merge https://github.com/civetweb/civetweb/commit/821241f5e4742576eb4de58dc3304c81678fd1d3 (Drop difftime from DEBUG_TRACE_FUNC() --> completely drop nsnow), since it does not add any information anyway - the current timestamp is still printed, the difference to the previous timestamp (thus, nsnow) is not required.

Also https://github.com/civetweb/civetweb/commit/dda09461a8d24ff57e590339fe9a43951886cbb3 ("More strict error check of mg_init_library()") can be merged. It's a general improvement, not directly related to this issue.

In https://github.com/civetweb/civetweb/commit/48d038ebdf2589d01ca83de4428f83fd50dc0eef (Protect server stats by mutex ) you used a mutex for counting used memory, instead of the existing atomics. You think the current "atomic" is not enough? One could add the "atomic_load_n" to satisfy some code analysis tools - like shown above, currently the "atomic_load_n" is an ordinary "mov" anyway (the compiled file will not change). Do you think something is still missing then?

xtne6f commented 4 years ago

You think the current "atomic" is not enough?

I think it is not enough, but it depends on what accuracy you require.

Firstly, if you want to protect two or more variables, I recommend you simply use a mutex. It is POSIX compliant. In addition, inter-thread (not interprocess) mutexes are normally fast enough.

You wrote:

From my experience, a 'n' bit processor will load/store 'n' bit atomically (from RAM) without any special instruction, and usually provides dedicated instructions to load and store 2n bits atomically (e.g., InterlockedCompareExchange128 is only available on 64 bit systems).

I don't know detailed CPU architecture, but x86 (Pentium or later) provides cmpxchg8b instruction, so, limited to x86, it would be true. Similarly, limited to x86 or x64, the "__atomic_load_n" would be an ordinary "mov". It is exactly architecture dependent. However, if you choose atomics, I recommend you declare totalMemUsed and maxMemUsed as uintptr_t. I expect there is relatively high possibility that atomic functions which exceed the pointer size might not be provided or not be internally "lock-free".

Moreover, even if each variable is atomic, as you have commented at the source, the following calculation would be still very difficult (again, it depends on required accuracy):

mstat->totalMemUsed += (int64_t)size;
if (mstat->totalMemUsed > mstat->maxMemUsed) {
    mstat->maxMemUsed = mstat->totalMemUsed;
}

If there is mg_atomic_add(uintptr_t *, uintptr_t) ( and mg_atomic_sub(uintptr_t *, uintptr_t) ), here is a candidate:

uintptr_t tmem = mg_atomic_add(&mstat->totalMemUsed, size);
uintptr_t mmem = mg_atomic_add(&mstat->maxMemUsed, 0); /** semantically same as atomic_load **/
if (tmem > mmem) {
     mg_atomic_add(&mstat->maxMemUsed, tmem - mmem);
}

Although, this is not always accurate. In my opinion, such a technique makes things more complicated.

Oipo commented 4 years ago

Sorry for not replying sooner, I'm really rather busy at the moment. Also sorry for replying rather tersely in this comment:

... If the cache is flushed some milliseconds later, the system will stop some milliseconds later. ...

Some CPU architectures, the cache is not flushed, so the system will never stop. Arguable, most CPU architectures these days use MESI variants, making this a niche issue. I believe the MicroBlaze architecture would be easiest to demonstrate this, but I don't have the time currently.

To what undefined behavior exactly are you referring?

5.1.2.4.25: "The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior"

I'm not sure if there is some point in the code where these variables are read without atomic inc/dec (except for debugging purposes)

The debug statistics and p_reload_lock, proc->references contain this issue. Though if the debug statistics are removed, that only leaves those two. Though I have not checked the commits made by xtne6f.

A 32 bit CPU will read and write a 32 bit value atomically, a 64 bit CPU will read and write a 64 bit value atomically (provided it is properly aligned and in RAM).

This is correct. I am talking about the case of the various int64 variables being used on a 32 bit CPU though. In that case, memory/word tearing can occur.

As mentioned above, C11 introduced a strict memory model, and it now provides strictly defined atomic functions. But, unfortunately, it is only C11 and above.

I believe all compilers that implement C11 or later also made the memory model apply retroactively. This can be seen by, for example, gcc supporting atomics in C99 mode. So compiling in C99 mode with a compiler that implements C11, this would probably still hold. The question is, how many compilers have not yet implemented C11? It's been 9 years now, as an embedded software developer myself, I do not know of any.

bel2125 commented 4 years ago

I think it is not enough, but it depends on what accuracy you require

My requirement for memory statistics was not high accuracy, but low impact on the timing. The values are only collected to be seen by a human - so if they are off by 100 milliseconds (some alloc calls), it is not a problem.

bel2125 commented 4 years ago

This is correct. I am talking about the case of the various int64 variables being used on a 32 bit CPU though. In that case, memory/word tearing can occur.

I think these cases don't exist, because on a 32 bit CPU only the lower 32 bits of the 64 bit CPU are used - the upper bits are always zero. The variables count memory sizes, and 32 bit cpus will not handle more than 2GB RAM (4GB RAM) in one process.

The question is, how many compilers have not yet implemented C11?

There are still some "WinCE" cases in the server. WinCE supports Visual Studio 2008 at most, so there is not even C99 support. I dropped actively testing for WinCE some years ago (but left the code), and sometimes we get issue reports for something that's broken there. I did not accept pull requests that rely on C++11 up to now (although the last request that relied on this was ~3 years ago). Some C compilers still require variable to be declared on top of a block (even Visual Studio 2010 and 2012, as far as I remember). Some embedded SDKs miss some other API functions typically available on Linux or Windows, so there are some #defines to avoid/replace these functions.

Oipo commented 4 years ago

WinCE

You got me there, I don't develop for WinCE nor Windows environments.

I think these cases don't exist

You really miss the forest for trees then. Tell me, is addr on this line a 64 bit variable on a 32 bit CPU (or on a 32 bit OS!)?

What happens there is that the compiler is forced to generate two 32-bit stores. One for the high bits and one for the low bits. This means that regardless of instruction-level atomicity, word/memory tearing can occur. The only two ways of dealing with this are 1) use word-size variables such as int16 on 16-bit, int32 on 32-bit and int64 on 64-bit or 2) (the better approach) use mutexes.

bel2125 commented 4 years ago

I merged (cherry-picked) https://github.com/civetweb/civetweb/commit/dda09461a8d24ff57e590339fe9a43951886cbb3 (More strict error check of mg_init_library()) and https://github.com/civetweb/civetweb/commit/821241f5e4742576eb4de58dc3304c81678fd1d3 (https://github.com/civetweb/civetweb/commit/821241f5e4742576eb4de58dc3304c81678fd1d3) as mentioned before.

bel2125 commented 4 years ago

WinCE

WinCE is a platform that will be dropped sooner or later.

You really miss the forest for trees then. Tell me, is addr on this line a 64 bit variable on a 32 bit CPU (or on a 32 bit OS!)? ...

Let's take the example from above again:

uint64_t a, b, c, d;

void load()
{
  a = b;
  c = __atomic_load_n(&d, __ATOMIC_ACQUIRE);
}

for 64 bits:

        movq    b(%rip), %rax
        movq    %rax, a(%rip)
        movq    d(%rip), %rax
        movq    %rax, c(%rip)

for 32 bits:

        pushl   %ebx
        movl    b, %eax   // load low dword of b
        movl    b+4, %ecx // load high dword of b
        xorl    %edx, %edx
        xorl    %ebx, %ebx
        movl    %ecx, a+4 // store high dword into a
        movl    %eax, a   // store low dword into a
        xorl    %eax, %eax
        xorl    %ecx, %ecx
        lock            cmpxchg8b       d  // load qword d
        movl    %edx, c+4 // store high dword into c
        movl    %eax, c   // store low dword into c
        popl    %ebx

On a 32 bit CPU, the input value (b or c) are always < 2^32 on a 32 Bit architecture, since they store a memory size and there is no more than 2^32 bytes per process (usually 2^31 in user space) in a 32 bit process (unless we use an exotic memory model, that is not supported anyway, and not relevant in this case). So the "load high word" / "store high word" operation always copies the value 0 - so there is no word tearing.

use word-size variables such as int16 on 16-bit, int32 on 32-bit and int64 on 64-bit

That's what I meant before with "so we only need size_t". However, Windows does not offer "InterlockedIncrementSize_t" but only "InterlockedIncrement64" - in contrast to the __atomic keywords of GCC/clang.

xtne6f commented 4 years ago

The values are only collected to be seen by a human

Unexpectedly, such values are often used for some kind of automation by users. For example, some users might write a script like "if totalMemUsed or maxMemUsed exceed some level, kill other processes for saving memory...". As @Oipo says, if memory/word tearing happens, the value might jump up suddenly.

Oipo commented 4 years ago

On a 32 bit CPU, the input value (b or c) are always < 2^32 on a 32 Bit architecture, since they store a memory size

If this is the case then yeah, using size_t (or preferably uintptr_t) would make this a non-issue.

However, Windows does not offer "InterlockedIncrementSize_t" but only "InterlockedIncrement64" - in contrast to the __atomic keywords of GCC/clang.

But windows does offer non-64 InterlockedIncrement. So using compile-time defines, the right call can be selected anyway?

Though I agree that it's more of a hassle than what GCC/clang offer.

bel2125 commented 4 years ago

But windows does offer non-64 InterlockedIncrement. So using compile-time defines, the right call can be selected anyway?

You are right, but how would the code look like?

#if sizeof(size_t) == 4
InterlockedIncrement
#elif sizeof(size_t) == 8
InterlockedIncrement64
#else
#error "Whatever architecture ???"
#endif

This is an "invalid integer constant expression" for the Visual Studio preprocessor.

Probably a different define like WIN64 would do, with some assert that sizeof size_t is really 8 then, or using not size_t but uint32_t or uint64_t depending on this WIN64 define.

I just did not do this, since it seemed not necessary ("don't solve any problem you don't have"), since using Int64 the way I did works. However, as I can see from the discussion, there is quite some need for explanation on the boundary conditions why this works. So probably I should still do it, and add some lines of comments how this was meant to be used.

bel2125 commented 4 years ago

And using if (sizeof(size_t) == 4) {...} causes warnings like "this condition is always true/false", "this code block in never reached", ...

Oipo commented 4 years ago

I would leave it up to the developer to select the architecture size. They know what they're compiling for.

#if ARCH_SIZE == 16
InterlockedIncrement16
#elif ARCH_SIZE == 32
InterlockedIncrement
#elif ARCH_SIZE == 64
InterlockedIncrement64
#else
#error "No architecture defined!"
#endif

It is even possible to do some checking for some platforms, e.g.

#ifdef __x86_64__
    #if SIZE_MAX == 0xFFFFFFFF
       // This is x32
    #else
        // This is x86_64 but not x32
    #endif
#else
    #ifdef __X86__
    // etc
   #endif
#endif

There's a list of defines available here: https://sourceforge.net/p/predef/wiki/Architectures/

I understand that it's a lot of hassle, but it is not impossible.

bel2125 commented 4 years ago

I understand that it's a lot of hassle, but it is not impossible.

Well, I wanted to avoid the hassle, since the code works without as well and memory tearing cannot occur - although it requires some explanations on the boundary conditions it's relying on.

Using SIZE_MAX might indeed be an option. It is C99, but November 2016 we decided "stdint.h" is no longer optional (a076201c4d51075e735f2383d531f4f9e797e017), so SIZE_MAX can be assumed to exist.

I think the more important question is, what guarantee we want to give on memory statistics. In my opinion, any measurement should have as little influence on the system as possible. With the current

if (mstat->totalMemUsed > mstat->maxMemUsed) {
    mstat->maxMemUsed = mstat->totalMemUsed;
}

there is no guarantee that totalMemUsed <= maxMemUsed - there is an (extremely small) chance, that totalMemUsed > maxMemUsed until the next alloc operation corrects this temporary discrepancy. The difference might be some hundred bytes, since CivetWeb typically does not allocate megabytes once the server is started. This is, when the usage statistics is read between (atomically) updating the bytes currently used and updating the maximum - or when one thread allocates some bytes, then the second thread allocates some bytes increasing the maximum, and then the first thread sets a lower maximum. I rated this to be acceptable, since memory usage is always just a snapshot of the moment - reading some microseconds earlier would have returned a different value anyway. To avoid this, we could use a mutex (https://github.com/civetweb/civetweb/commit/48d038ebdf2589d01ca83de4428f83fd50dc0eef from @xtne6f). I will have a look on if/how this could be done using atomics as well.

... that would be the final point discussed in this thread except the stop_flag issue this issue was all about when it has been created.

xtne6f commented 4 years ago

Are you going to resolve this issue only for HTTP2 branch? To be honest, I'm not so interested in USE_SERVER_STATS feature since I'm not using it for now. However, HTTP2 should be an experimental/feature branch, so I am worried that the issue remains unresolved for a long time. I would appreciate your keeping branch independence.

bel2125 commented 4 years ago

Are you going to resolve this issue only for HTTP2 branch?

"This issue" somehow became a collection of multiple topics and a discussion of some related and partially also unrelated topics.

What is currently in the HTTP2 branch: Replacing 64-bit atomics by "size_t" (ptrdiff_t) atomics for memory statistics (USE_SERVER_STATS). Since used memory is always < 2^31 Bytes in a 32 bit process, there was no problem in the current implementation (https://github.com/civetweb/civetweb/issues/861#issuecomment-639327840), still using "ptrdiff_t" for memory sizes is more clear/natural/intuitive. This is what is currently changed in the HTTP2 branch - the HTTP2 branch is highly experimental and in a test. I found out that there are some odd differences in the availability of atomic functions in the Win32 and Win64 SDK (more than just missing 128 bit functions), there is even an InterlockedIncrementSizeT but there are differences for InterlockedAdd, ... and one must use a higher SDK version number define to use it. ... a lot of details to try and test for Microsoft Windows. That's why I did not want to put it directly to the production branch. We could have created yet another branch, but this doesn't make testing any easier. I could still move all required/related commits to new branch for review. There is also your mutex solution (https://github.com/civetweb/civetweb/commit/48d038ebdf2589d01ca83de4428f83fd50dc0eef) - if atomics don't work as expected, or it is too complicated with the different SDK versions and different compilers or better for any other reason, we can still use this. That's what I meant with the last few lines in my last comment.

Anyway: also my changes still deal with everything except the stop_flag

To be honest, I'm not so interested in USE_SERVER_STATS feature since I'm not using it for now.

Then, I think the commits in the HTTP2 branch will not change anything for you. I do count some stuff like the number of library initialization calls in "ptrdiff_t" now instead of "int" to just have one data type for the atomic function (gcc has handy compiler builtins that will use multiple data types, but for MSVC I must use a different function for each different data type). But this modification will not change the behavior.

However, HTTP2 should be an experimental/feature branch, so I am worried that the issue remains unresolved for a long time. I would appreciate your keeping branch independence.

I can cherry-pick the "atomic" commits for USE_SERVER_STATS earlier, or create a second branch for review first.

bel2125 commented 4 years ago

There is a new branch "atomic-server-stats" - these are the changes, please have a look: https://github.com/civetweb/civetweb/compare/atomic-server-stats?expand=1

bel2125 commented 3 years ago

Since quite some time there is a define "STOP_FLAG_NEEDS_LOCK" to use atomic operations for the stop flag on those architectures that really need it. However, we never closed this issue.

streetsofboston commented 1 year ago

When STOP_FLAG_NEEDS_LOCK is defined, this line is executed: https://github.com/civetweb/civetweb/blob/master/src/civetweb.c#L2341C1-L2342C1

static void
STOP_FLAG_ASSIGN(stop_flag_t *f, stop_flag_t v)
{
    stop_flag_t sf = 0;
    do {
        sf = mg_atomic_compare_and_swap(f, *f, v);
    } while (sf != v);
}

The 2nd argument of this line sf = mg_atomic_compare_and_swap(f, *f, v); causes the race condition. The "*f" reads unsynchronized-ly from the address f.

For Windows32/64, that is fine (doing *f for a 32/64 bit value is atomic), but for linux it is not. The call to *f must be locked in a global-lock (if NO_ATOMICS) or synchronously fetched by __sync_add_and_fetch(f, 0) (if !NO_ATOMICS). You could extract all this 'synchronously fetching of the value' into a seperate method eg ptrdiff_t mg_atomic_get(volatile ptrdiff_t *addr) { ... } and call that method:

static void
STOP_FLAG_ASSIGN(stop_flag_t *f, stop_flag_t v)
{
    stop_flag_t sf = 0;
    do {
        sf = mg_atomic_compare_and_swap(f, mg_atomic_get(f), v);
    } while (sf != v);
}

timwoj commented 5 months ago

I can confirm that we ran into this on 1.16 (included with prometheus-cpp) today, so it should probably be reopened.

bel2125 commented 5 months ago

What precisely means "we ran into this"?

What is the exact observation? What civetweb version? What operating system? What CPU architecture? What build flags were set?

timwoj commented 5 months ago

The same as in the original comment on this issue (error log from https://cirrus-ci.com/task/4959117092061184):

  WARNING: ThreadSanitizer: data race (pid=224576)
    Write of size 4 at 0x7ffff10c8028 by main thread:
      #0 mg_stop /zeek/build/auxil/prometheus-cpp/pull/../../../../auxil/prometheus-cpp/3rdparty/civetweb/src/civetweb.c:20363:2 (zeek+0x117c73f) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)
      #1 CivetServer::close() /zeek/build/auxil/prometheus-cpp/pull/../../../../auxil/prometheus-cpp/3rdparty/civetweb/src/CivetServer.cpp:545:3 (zeek+0x116bac5) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)
      #2 CivetServer::~CivetServer() /zeek/build/auxil/prometheus-cpp/pull/../../../../auxil/prometheus-cpp/3rdparty/civetweb/src/CivetServer.cpp:476:2 (zeek+0x116bac5)
      #3 CivetServer::~CivetServer() /zeek/build/auxil/prometheus-cpp/pull/../../../../auxil/prometheus-cpp/3rdparty/civetweb/src/CivetServer.cpp:475:1 (zeek+0x116bb89) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)
      #4 std::default_delete<CivetServer>::operator()(CivetServer*) const /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/unique_ptr.h:99:2 (zeek+0x11693ef) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)
      #5 std::unique_ptr<CivetServer, std::default_delete<CivetServer>>::~unique_ptr() /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/unique_ptr.h:404:4 (zeek+0x11693ef)
      #6 prometheus::Exposer::~Exposer() /zeek/build/auxil/prometheus-cpp/pull/../../../../auxil/prometheus-cpp/pull/src/exposer.cc:26:19 (zeek+0x11693ef)
      #7 std::default_delete<prometheus::Exposer>::operator()(prometheus::Exposer*) const /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/unique_ptr.h:99:2 (zeek+0x82649e) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)
      #8 std::unique_ptr<prometheus::Exposer, std::default_delete<prometheus::Exposer>>::~unique_ptr() /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/unique_ptr.h:404:4 (zeek+0x82649e)
      #9 zeek::telemetry::Manager::~Manager() /zeek/build/src/telemetry/../../../src/telemetry/Manager.cc:32:22 (zeek+0x82649e)
      #10 zeek::detail::terminate_zeek() /zeek/build/src/../../src/zeek-setup.cc:402:5 (zeek+0xb86965) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)
      #11 zeek::run_state::detail::zeek_terminate_loop(char const*) /zeek/build/src/../../src/zeek-setup.cc:1092:5 (zeek+0xb86a78) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)
      #12 zeek::run_state::detail::run_loop() /zeek/build/src/../../src/RunState.cc:324:13 (zeek+0xd555ac) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)
      #13 main /zeek/src/main.cc:93:9 (zeek+0x445a7b) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)

    Previous read of size 4 at 0x7ffff10c8028 by thread T3 (mutexes: write M0):
      #0 consume_socket /zeek/build/auxil/prometheus-cpp/pull/../../../../auxil/prometheus-cpp/3rdparty/civetweb/src/civetweb.c:19633:13 (zeek+0x117f6c3) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)
      #1 worker_thread_run /zeek/build/auxil/prometheus-cpp/pull/../../../../auxil/prometheus-cpp/3rdparty/civetweb/src/civetweb.c:19775:9 (zeek+0x117f6c3)
      #2 worker_thread /zeek/build/auxil/prometheus-cpp/pull/../../../../auxil/prometheus-cpp/3rdparty/civetweb/src/civetweb.c:19948:2 (zeek+0x117f6c3)

    As if synchronized via sleep:
      #0 usleep <null> (zeek+0x3c278e) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)
      #1 zeek::threading::MsgThread::OnWaitForStop() /zeek/build/src/../../src/threading/MsgThread.cc:249:13 (zeek+0xe33fa6) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)
      #2 zeek::threading::BasicThread::WaitForStop() /zeek/build/src/../../src/threading/BasicThread.cc:115:5 (zeek+0xe2fb56) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)
      #3 zeek::threading::Manager::Terminate() /zeek/build/src/../../src/threading/Manager.cc:53:15 (zeek+0xe31801) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)
      #4 zeek::detail::terminate_zeek() /zeek/build/src/../../src/zeek-setup.cc:377:17 (zeek+0xb86720) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)
      #5 zeek::run_state::detail::zeek_terminate_loop(char const*) /zeek/build/src/../../src/zeek-setup.cc:1092:5 (zeek+0xb86a78) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)
      #6 zeek::run_state::detail::run_loop() /zeek/build/src/../../src/RunState.cc:324:13 (zeek+0xd555ac) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)
      #7 main /zeek/src/main.cc:93:9 (zeek+0x445a7b) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)

    Location is heap block of size 165288 at 0x7ffff10c8000 allocated by main thread:
      #0 calloc <null> (zeek+0x3c3d2a) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)
      #1 mg_calloc /zeek/build/auxil/prometheus-cpp/pull/../../../../auxil/prometheus-cpp/3rdparty/civetweb/src/civetweb.c:1482:9 (zeek+0x117cad4) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)
      #2 mg_start2 /zeek/build/auxil/prometheus-cpp/pull/../../../../auxil/prometheus-cpp/3rdparty/civetweb/src/civetweb.c:20505:29 (zeek+0x117cad4)
      #3 CivetServer::CivetServer(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>> const&, CivetCallbacks const*, void const*) /zeek/build/auxil/prometheus-cpp/pull/../../../../auxil/prometheus-cpp/3rdparty/civetweb/src/CivetServer.cpp:463:12 (zeek+0x116b95b) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)
      #4 std::unique_ptr<CivetServer, std::default_delete<CivetServer>> prometheus::detail::make_unique<CivetServer, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>, CivetCallbacks const*&>(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>&&, CivetCallbacks const*&) /zeek/build/auxil/prometheus-cpp/pull/../../../../auxil/prometheus-cpp/core/include/prometheus/detail/future_std.h:12:33 (zeek+0x11690da) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)
      #5 prometheus::Exposer::Exposer(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>, CivetCallbacks const*) /zeek/build/auxil/prometheus-cpp/pull/../../../../auxil/prometheus-cpp/pull/src/exposer.cc:23:15 (zeek+0x11690da)
      #6 prometheus::Exposer::Exposer(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, unsigned long, CivetCallbacks const*) /zeek/build/auxil/prometheus-cpp/pull/../../../../auxil/prometheus-cpp/pull/src/exposer.cc:16:7 (zeek+0x11690da)
      #7 std::__detail::_MakeUniq<prometheus::Exposer>::__single_object std::make_unique<prometheus::Exposer, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>&, int, CivetCallbacks*&>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>&, int&&, CivetCallbacks*&) /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/unique_ptr.h:1070:34 (zeek+0x8272a1) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)
      #8 zeek::telemetry::Manager::InitPostScript() /zeek/build/src/telemetry/../../../src/telemetry/Manager.cc:76:38 (zeek+0x8272a1)
      #9 zeek::detail::setup(int, char**, zeek::Options*) /zeek/build/src/../../src/zeek-setup.cc:802:24 (zeek+0xb82c79) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)
      #10 main /zeek/src/main.cc:52:25 (zeek+0x445965) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)

    Mutex M0 (0x7ffff10c8030) created at:
      #0 pthread_mutex_init <null> (zeek+0x3c6ca3) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)
      #1 mg_start2 /zeek/build/auxil/prometheus-cpp/pull/../../../../auxil/prometheus-cpp/3rdparty/civetweb/src/civetweb.c:20535:13 (zeek+0x117cb72) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)
      #2 CivetServer::CivetServer(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>> const&, CivetCallbacks const*, void const*) /zeek/build/auxil/prometheus-cpp/pull/../../../../auxil/prometheus-cpp/3rdparty/civetweb/src/CivetServer.cpp:463:12 (zeek+0x116b95b) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)
      #3 std::unique_ptr<CivetServer, std::default_delete<CivetServer>> prometheus::detail::make_unique<CivetServer, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>, CivetCallbacks const*&>(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>&&, CivetCallbacks const*&) /zeek/build/auxil/prometheus-cpp/pull/../../../../auxil/prometheus-cpp/core/include/prometheus/detail/future_std.h:12:33 (zeek+0x11690da) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)
      #4 prometheus::Exposer::Exposer(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>, CivetCallbacks const*) /zeek/build/auxil/prometheus-cpp/pull/../../../../auxil/prometheus-cpp/pull/src/exposer.cc:23:15 (zeek+0x11690da)
      #5 prometheus::Exposer::Exposer(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, unsigned long, CivetCallbacks const*) /zeek/build/auxil/prometheus-cpp/pull/../../../../auxil/prometheus-cpp/pull/src/exposer.cc:16:7 (zeek+0x11690da)
      #6 std::__detail::_MakeUniq<prometheus::Exposer>::__single_object std::make_unique<prometheus::Exposer, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>&, int, CivetCallbacks*&>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>&, int&&, CivetCallbacks*&) /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/unique_ptr.h:1070:34 (zeek+0x8272a1) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)
      #7 zeek::telemetry::Manager::InitPostScript() /zeek/build/src/telemetry/../../../src/telemetry/Manager.cc:76:38 (zeek+0x8272a1)
      #8 zeek::detail::setup(int, char**, zeek::Options*) /zeek/build/src/../../src/zeek-setup.cc:802:24 (zeek+0xb82c79) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)
      #9 main /zeek/src/main.cc:52:25 (zeek+0x445965) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)

    Thread T3 'civetweb-worker' (tid=224838, running) created by main thread at:
      #0 pthread_create <null> (zeek+0x3c547f) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)
      #1 mg_start_thread_with_id /zeek/build/auxil/prometheus-cpp/pull/../../../../auxil/prometheus-cpp/3rdparty/civetweb/src/civetweb.c:5746:11 (zeek+0x117f10e) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)
      #2 mg_start2 /zeek/build/auxil/prometheus-cpp/pull/../../../../auxil/prometheus-cpp/3rdparty/civetweb/src/civetweb.c:21148:7 (zeek+0x117d62e) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)
      #3 CivetServer::CivetServer(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>> const&, CivetCallbacks const*, void const*) /zeek/build/auxil/prometheus-cpp/pull/../../../../auxil/prometheus-cpp/3rdparty/civetweb/src/CivetServer.cpp:463:12 (zeek+0x116b95b) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)
      #4 std::unique_ptr<CivetServer, std::default_delete<CivetServer>> prometheus::detail::make_unique<CivetServer, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>, CivetCallbacks const*&>(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>&&, CivetCallbacks const*&) /zeek/build/auxil/prometheus-cpp/pull/../../../../auxil/prometheus-cpp/core/include/prometheus/detail/future_std.h:12:33 (zeek+0x11690da) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)
      #5 prometheus::Exposer::Exposer(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>>>, CivetCallbacks const*) /zeek/build/auxil/prometheus-cpp/pull/../../../../auxil/prometheus-cpp/pull/src/exposer.cc:23:15 (zeek+0x11690da)
      #6 prometheus::Exposer::Exposer(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, unsigned long, CivetCallbacks const*) /zeek/build/auxil/prometheus-cpp/pull/../../../../auxil/prometheus-cpp/pull/src/exposer.cc:16:7 (zeek+0x11690da)
      #7 std::__detail::_MakeUniq<prometheus::Exposer>::__single_object std::make_unique<prometheus::Exposer, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>&, int, CivetCallbacks*&>(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>>&, int&&, CivetCallbacks*&) /usr/bin/../lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/bits/unique_ptr.h:1070:34 (zeek+0x8272a1) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)
      #8 zeek::telemetry::Manager::InitPostScript() /zeek/build/src/telemetry/../../../src/telemetry/Manager.cc:76:38 (zeek+0x8272a1)
      #9 zeek::detail::setup(int, char**, zeek::Options*) /zeek/build/src/../../src/zeek-setup.cc:802:24 (zeek+0xb82c79) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)
      #10 main /zeek/src/main.cc:52:25 (zeek+0x445965) (BuildId: ef4b18ee7db613a1803a9fbef66c06ebaf67d215)

Civetweb version: 1.16 (from prometheus-cpp 1.2.4) OS: Ubuntu 22 CPU Arch: x86-64 Compiler: Clang 18.1.3 CXXFLAGS: -Wall -Wno-unused -funsigned-char -g -DDEBUG -DBRO_DEBUG -Wno-register -Werror=vla -funsigned-char -fsanitize=thread -fno-omit-frame-pointer -fno-optimize-sibling-calls -O1 -g (although we strip the -DDEBUG flag before running the prometheus-cpp build to avoid civetweb debugging output)

bel2125 commented 5 months ago

Are you observing a real misbehavior of the program? Or only a warning message from a tool (AddressSanitizer, ThreadSanitzer, ...)?

timwoj commented 4 months ago

Are you observing a real misbehavior of the program? Or only a warning message from a tool (AddressSanitizer, ThreadSanitzer, ...)?

I've only seen it via ThreadSanitizer but that was what the original post on this issue was about anyways.