SRI-CSL / yices2

The Yices SMT Solver
https://yices.csl.sri.com/
GNU General Public License v3.0
368 stars 46 forks source link

yices_garbage_collect: thread safety #385

Closed ptr1120 closed 2 years ago

ptr1120 commented 2 years ago

Hello,

I am experiencing a segmentation fault when concurrently calling yices_garbage_collect.

static void worker_thread(int i) {
    context_t *ctx = yices_new_context(NULL);
    type_t bool_type = yices_bool_type();
    term_t term = yices_new_uninterpreted_term(bool_type);
    yices_incref_type(bool_type);
    yices_incref_term(term);

    yices_decref_term(term);
    yices_decref_type(bool_type);
    yices_garbage_collect(NULL, 0, NULL, 0, false);
    yices_free_context(ctx);
}

static void garbage_collect_test(void) {
    unsigned num_threads = 50;
    std::vector<std::thread> threads(num_threads);
    for (unsigned i = 0; i < num_threads; ++i) {
        threads[i] = std::thread([&, i]() { worker_thread(i); });
    }
    for (auto &th: threads) {
        th.join();
    }
}

I build yices2 with --enable-thread-safety from the current master branch. If I remove the call to either yices_garbage_collect or yices_free_context there seems to be no issue. Is the garbage collection not supported in a multi-threaded scenario/ am I doing something wrong?

Thanks Peter

ianamason commented 2 years ago

I would stay away from GC-ing in the multi-threaded world. In theory it should be OK, but I am pretty sure I didn't test it when I was doing the thread safe port. I should also warn you that I did little testing on Windows. So you are treading fresh soil.

ianamason commented 2 years ago

If I get some spare time I will look into the GC-ing ...

ptr1120 commented 2 years ago

Thank you, would be great if you find some time in looking into this. Since I am using Linux, Windows would not be a Problem.

ptr1120 commented 2 years ago

Maybe it helps narrow down the issue when building with sanitize mode and running the example AddressSanitizer reports:

context/internalization_table.c:846:9: runtime error: load of null pointer of type 'int32_t' AddressSanitizer:DEADLYSIGNAL

==173958==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7fa6744c6957 bp 0x7fa663be8c10 sp 0x7fa663be8ba0 T23) ==173958==The signal is caused by a READ memory access. ==173958==Hint: address points to the zero page.

0 0x7fa6744c6956 in intern_tbl_gc_mark context/internalization_table.c:846

1 0x7fa67446dbda in context_gc_mark context/context.c:6669

2 0x7fa6743e9785 in context_list_gc_mark api/yices_api.c:12309

3 0x7fa6743e9785 in _o_yices_garbage_collect api/yices_api.c:12378

4 0x7fa6743ea1d3 in yices_garbage_collect api/yices_api.c:12364

5 0x55ed9a34cc67 in worker_thread /Program.cpp:125

ianamason commented 2 years ago

It hasn't fallen off my list. I will valgrind it in the next few days. Had lots of choirs to do because of the new release.

ptr1120 commented 2 years ago

Would be great if you can valgrind this. :grinning:

ianamason commented 2 years ago

Yep after the break.

ianamason commented 2 years ago

OK so I cannot duplicate this with pure C. Does this work for you?

ptr1120 commented 2 years ago

Thank you. I am stuck on build problems:

test_yices_gc_mt.c:6:10: fatal error: yices.h: No such file or directory
6 | #include "yices.h"
  |          ^~~~~~~~~

I used the current git version of yices2. The build is done by the following docker file:

FROM gcc:11-bullseye as build

RUN apt update \
    && apt install -y python3 git cmake python-is-python3 dos2unix libgmp-dev gperf python3-pip lcov

COPY . src/
WORKDIR src

RUN find . -type f -exec dos2unix {} \;

RUN autoconf \
    && ./configure --enable-thread-safety  \
    && MODE=debug make -j$(nproc) \
    && cd examples \
    && gcc  -I../src/ test_yices_gc_mt.c -o test_yices_gc_mt -lyices -pthread ../build/x86_64-pc-linux-gnu-debug/obj/mt/threads.o

Do you have any idea?

ptr1120 commented 2 years ago

Ok, finally I can reproduce it with this code. I created a separate project consuming yices2.6.4 and added the given code. In order to reproduce I took 150 as the number of threads, I think it is important to use much more threads than you have on your machine in order to force them to run at a different time.

This is my output:

Testing Yices 2.6.4 (x86_64-pc-linux-gnu, sanitize)
150 threads
Logging thread 0 to /tmp/thread_main_0.txt
Testing Yices 2.6.4 (x86_64-pc-linux-gnu, sanitize)
Logging thread 1 to /tmp/thread_main_1.txt
Testing Yices 2.6.4 (x86_64-pc-linux-gnu, sanitize)
Logging thread 2 to /tmp/thread_main_2.txt
Testing Yices 2.6.4 (x86_64-pc-linux-gnu, sanitize)
Logging thread 3 to /tmp/thread_main_3.txt
Logging thread 4 to /tmp/thread_main_4.txt
Testing Yices 2.6.4 (x86_64-pc-linux-gnu, sanitize)
Logging thread 5 to /tmp/thread_main_5.txt
Testing Yices 2.6.4 (x86_64-pc-linux-gnu, sanitize)
Logging thread 6 to /tmp/thread_main_6.txt
Testing Yices 2.6.4 (x86_64-pc-linux-gnu, sanitize)
Logging thread 7 to /tmp/thread_main_7.txt
Testing Yices 2.6.4 (x86_64-pc-linux-gnu, sanitize)
context/internalization_table.c:846:9: runtime error: load of null pointer of type 'int32_t'
AddressSanitizer:DEADLYSIGNAL
=================================================================
==19250==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f90f3546957 bp 0x7f90eccfcd10 sp 0x7f90eccfcca0 T3)
==19250==The signal is caused by a READ memory access.
==19250==Hint: address points to the zero page.
    #0 0x7f90f3546956 in intern_tbl_gc_mark context/internalization_table.c:846
    #1 0x7f90f34edbda in context_gc_mark context/context.c:6669
    #2 0x7f90f3469785 in context_list_gc_mark api/yices_api.c:12309
    #3 0x7f90f3469785 in _o_yices_garbage_collect api/yices_api.c:12378
    #4 0x7f90f346a1d3 in yices_garbage_collect api/yices_api.c:12364
    #5 0x55922002e053 in thread_main(void*) /some-path/Program.cpp:157
    #6 0x7f90f200e608 in start_thread /build/glibc-eX1tMB/glibc-2.31/nptl/pthread_create.c:477
    #7 0x7f90f1f35292 in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x122292)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV context/internalization_table.c:846 in intern_tbl_gc_mark
Thread T3 created by T0 here:
    #0 0x7f90f557f805 in pthread_create (/usr/lib/gcc/x86_64-linux-gnu/9/libasan.so+0x3a805)
    #1 0x55922002f665 in launch_threads(int, void*, unsigned long, char const*, void* (*)(void*), bool) /some-path/threads_posix.cpp:57
    #2 0x55922002e099 in garbage_collect_test_c /some-path/Program.cpp:166
    #3 0x55922002e0e0 in main /some-path/Program.cpp:174
    #4 0x7f90f1e3a0b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)

==19250==ABORTING

Process finished with exit code 1

Btw1: the issue does not occur on every execution of your given test code: Sometimes I have to try it 10-20 times until the error occurs. Interestingwise using my c++ code with the same number of threads it is easier to reproduce nearly on every run.

Btw2: During reproduction, another different issue type occurred:

Testing Yices 2.6.4 (x86_64-pc-linux-gnu, debug)
150 threads
Logging thread 0 to /tmp/thread_main_0.txt
Testing Yices 2.6.4 (x86_64-pc-linux-gnu, debug)
...
Logging thread 59 to /tmp/thread_main_59.txt

solvers/egraph/egraph_utils.c:52:9: runtime error: load of null pointer of type 'type_t'
AddressSanitizer:DEADLYSIGNAL
=================================================================
==21683==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7fded262479d bp 0x7fde82f6bd10 sp 0x7fde82f6bc70 T148)
==21683==The signal is caused by a READ memory access.
==21683==Hint: address points to the zero page.
    #0 0x7fded262479c in eterm_table_gc_mark solvers/egraph/egraph_utils.c:52
    #1 0x7fded262479c in egraph_gc_mark solvers/egraph/egraph_utils.c:87
    #2 0x7fded207eb4f in context_gc_mark context/context.c:6663
    #3 0x7fded1ffa785 in context_list_gc_mark api/yices_api.c:12309
    #4 0x7fded1ffa785 in _o_yices_garbage_collect api/yices_api.c:12378
    #5 0x7fded1ffb1d3 in yices_garbage_collect api/yices_api.c:12364
    #6 0x561400bd8053 in thread_main(void*) /some-path/Program.cpp:157
    #7 0x7fded0b9f608 in start_thread /build/glibc-eX1tMB/glibc-2.31/nptl/pthread_create.c:477
    #8 0x7fded0ac6292 in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x122292)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV solvers/egraph/egraph_utils.c:52 in eterm_table_gc_mark
Thread T148 created by T0 here:
    #0 0x7fded4110805 in pthread_create (/usr/lib/gcc/x86_64-linux-gnu/9/libasan.so+0x3a805)
    #1 0x561400bd9665 in launch_threads(int, void*, unsigned long, char const*, void* (*)(void*), bool) /some-path/threads_posix.cpp:57
    #2 0x561400bd8099 in garbage_collect_test_c /some-path/Program.cpp:166
    #3 0x561400bd80e0 in main /some-path/Program.cpp:174
    #4 0x7fded09cb0b2 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x270b2)

==21683==ABORTING
ianamason commented 2 years ago

OK will look into this more closely on Monday.

ianamason commented 2 years ago

OK so I can reproduce this crash, but it doesn't happen with valgrind, and helgrind detects no data races. gdb gives me:

Thread 124 "test_yices_gc_m" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffb17b2700 (LWP 430510)]
0x00007ffff7c44d76 in intern_tbl_gc_mark (tbl=0x7ffff000f430) at context/internalization_table.c:846
846     tau = tbl->type.map[i];
(gdb) bt
#0  0x00007ffff7c44d76 in intern_tbl_gc_mark (tbl=0x7ffff000f430) at context/internalization_table.c:846
#1  0x00007ffff7c1f8b9 in context_gc_mark (ctx=0x7ffff000f210) at context/context.c:6669
#2  0x00007ffff7c10785 in context_list_gc_mark () at api/yices_api.c:12309
#3  0x00007ffff7c105e4 in _o_yices_garbage_collect (t=0x0, nt=0, tau=0x0, ntau=0, keep_named=0) at api/yices_api.c:12378
#4  0x00007ffff7c105a8 in yices_garbage_collect (t=0x0, nt=0, tau=0x0, ntau=0, keep_named=0) at api/yices_api.c:12364
#5  0x000055555555536c in thread_main ()
#6  0x00007ffff7bbe609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#7  0x00007ffff7ae5293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb)

From this stack we can see that we have the lock, so indeed there should be no races. Maybe @BrunoDutertre would like to zoom with me and see if we can debug this?

BrunoDutertre commented 2 years ago

I'm open later today. 4pm?

Bruno

On Mon, Dec 13, 2021 at 9:32 AM Ian A Mason @.***> wrote:

OK so I can reproduce this crash, but it doesn't happen with valgrind, and helgrind detects no data races. gdb gives me:

Thread 124 "test_yices_gc_m" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffb17b2700 (LWP 430510)] 0x00007ffff7c44d76 in intern_tbl_gc_mark (tbl=0x7ffff000f430) at context/internalization_table.c:846 846 tau = tbl->type.map[i]; (gdb) bt

0 0x00007ffff7c44d76 in intern_tbl_gc_mark (tbl=0x7ffff000f430) at context/internalization_table.c:846

1 0x00007ffff7c1f8b9 in context_gc_mark (ctx=0x7ffff000f210) at context/context.c:6669

2 0x00007ffff7c10785 in context_list_gc_mark () at api/yices_api.c:12309

3 0x00007ffff7c105e4 in _o_yices_garbage_collect (t=0x0, nt=0, tau=0x0, ntau=0, keep_named=0) at api/yices_api.c:12378

4 0x00007ffff7c105a8 in yices_garbage_collect (t=0x0, nt=0, tau=0x0, ntau=0, keep_named=0) at api/yices_api.c:12364

5 0x000055555555536c in thread_main ()

6 0x00007ffff7bbe609 in start_thread (arg=) at pthread_create.c:477

7 0x00007ffff7ae5293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

(gdb)

From this stack we can see that we have the lock, so indeed there should be no races. Maybe @BrunoDutertre https://github.com/BrunoDutertre would like to zoom with me and see if we can debug this?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SRI-CSL/yices2/issues/385#issuecomment-992708622, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACA5BQXSTNXOFXC5OBEDODDUQYU3XANCNFSM5FR5XSHA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

ianamason commented 2 years ago

Fixed in 0c837516e390b0a436ef22473ac5bcf0a18be53a maybe @ptr1120

ianamason commented 2 years ago

Actually I think I have to do the same fix for models. This commit is 09f162107cc9140172e6c890d8ccc633126c3720.

ptr1120 commented 2 years ago

Looks very promising, thank very much. I will do some excessive testing at the weekend and tell you my results.

ianamason commented 2 years ago

By the way. After we close this issue. Remind me to tell you about another version of thread safe yices, where each thread has it's own state, and so GC-ing could be replaced by thread exiting. Note that this means all the terms and types used need to be created on a per thread basis. Its a separate thread, and I will only merge into master if there is interest.

ptr1120 commented 2 years ago

Hello,
I did several multi-threaded tests and was not able to reproduce any multi-threading issues! Thank you very much!

@ianamason: The per-thread-state version of yices sounds very interesting to me. I am not completely sure whether it works in my scenario, therefore something about my scenario here: I plan to use yices in a web application for device configuration (configuration problem). So I have a device model which has several characteristics and several restrictions among all its variations. A user can configure the device in an interactive session, thereby a lot of assumptions and assertions are added to the current model, and checks are executed by yices in order to check whether the model still holds. Each yices solver instance is associated to 1 user (so no access from different threads to one solver is ensured) and is (re-)used as long as the interactive configuration session of the user takes(some minutes - hours). What do you think matches this scenario with the per-thread-state version or do you see problems? If so, I would be glad to test this version.

ianamason commented 2 years ago

So you are running a server, and spawn a new thread to handle a client request. Each client has it's own context etc. Right? So this seems perfect for the "per-thread-state", the only speed bump is that for each thread you would have to assert the "standard model" (theory right?) prior to the client extending it.

ptr1120 commented 2 years ago

Hello, @ianamason thanks for your answer. Yes, you are right each user has its own solver context. I would really like to make some extensive tests with the "per-thread-state" version of Yices. I think I will need to reinitialize Yices for every request of the user, but I measured that the encoding of my models does not take too much time (e.g. 10-20 ms for a model with 400 variables and 600 constraints). Some questions as preparation for my tests:

  1. Do I need to call yices_global_init() or is it enough to initialize with yices_per_thread_init()?
  2. Must Yices additionally be built with --enable-thread-safety when using the "per-thread-state" version or can it be built lock-free?
  3. How isolated is the "per-thread-state" version? Does it mean that every thread has its own Term table? I ask because I suddenly experienced another multi-threading/garbage-collection issue that looks like a race condition with the Term ids. I will try to provide a reproducible test for it later.
  4. Can you maybe rebase/merge the "per-thread-state" branch with the master, I think the master has some fixes that I will require.

Thank you Peter

ianamason commented 2 years ago
  1. The example here shows you how to do it. Your main thread should call yices_init and yices_exit as usual, but each new worker thread needs to do yices_per_thread_init and yices_per_thread_exit

  2. You need to configure with

    ./configure --enable-thread-safety  CFLAGS=-DPER_THREAD_STATE

    that does all the stuff under the hood (i.e. makes all the locking no-ops etc)

  3. Yes every thread has its own term and type tables etc. You shouldn't need to GC, if there are errors when you do, then the GC needs some TLC. But calling yices_per_thread_exit is all you should really need to do.

  4. There is a 2.6.2-per-thread-state branch that might help. I will try and bring it into the main branch soon.

ianamason commented 2 years ago
  1. (answered properly this time) I merged master into per-thread-state so you should be good with that. I'll get to making per-thread-state master "soon".
ptr1120 commented 2 years ago

Thanks for your answer sounds very good. I will start testing at weekend, it will take some time to rearrange some things on my side and do the tests. I will update you as soon as I have results.

ptr1120 commented 2 years ago

Hi @ianamason ,

did my tests so far. Results are amazing!

So, all in all, great work!

Btw: maybe one little issue: when calling yices_num_terms from the main thread during a multithreaded execution I run into a segmentation fault.

ianamason commented 2 years ago

That sounds like good news. In the per-thread-state mode, calling just about any API routine from the main thread should cause a segmentation fault. I need to add this to the README (have done so now). Only the worker threads should call API routines.