Clemapfel / jluna

Julia Wrapper for C++ with Focus on Safety, Elegance, and Ease of Use
https://clemens-cords.com/jluna
MIT License
239 stars 12 forks source link

Multithreading crashes: KeyError in get_reference(key::UInt64) / free_reference(key::UInt64) && No method matching create_reference(::UInt64) #32

Closed paulerikf closed 1 year ago

paulerikf commented 1 year ago

Hey there clem, I'm running into a few different crashes whenever I try multithreading. Here's a minimal example that tends to crash in 5 seconds or less. Am I misunderstanding the docs and doing something unsafe here?

ctest --verbose output is all fine (except for the resize_array test #25) Ubuntu 20.04, Julia 1.7.1, clang-14

Minimal Example:

#include <jluna.hpp>

using namespace jluna;

int main() {
    initialize(4);

    auto lambda = [](){
        while(true) {
            Main.safe_eval("@info \"lambda:\" Threads.threadid()");
            Main.safe_eval("sleep(1)");
        }
    };

    Task<void> t1 = ThreadPool::create<void()>(lambda);
    t1.schedule();

    while(true) {
        Main.safe_eval("@info \"main:\" Threads.threadid()");
        Main.safe_eval("sleep(1)");
    }

    return 0;
}

Most common crash output Worth noting the crash sometimes happens in free_reference(key::UInt64) rather than get_reference(key::UInt64).

terminate called after throwing an instance of 'jluna::JuliaException'
  what():  [JULIA][EXCEPTION] KeyError: key 0x00007f88f5adc750 not found
Stacktrace:
 [1] getindex
   @ ./dict.jl:481 [inlined]
 [2] get_reference(key::UInt64)
   @ Main.jluna.memory_handler ./none:594
 [3] safe_call(f::Function, args::UInt64)
   @ Main.jluna ./none:18
 [4] (::Main.jluna.cppcall.var"#3#4"{UInt64})()
   @ Main.jluna.cppcall ./none:813

signal (6): Aborted
in expression starting at none:1
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7f89591a9a30)
unknown function (ip: 0x7f89591b55db)
_ZSt9terminatev at /lib/x86_64-linux-gnu/libstdc++.so.6 (unknown line)
__cxa_throw at /lib/x86_64-linux-gnu/libstdc++.so.6 (unknown line)
safe_call<_jl_value_t *> at /home/frivold/kef_env/warm_dep_ws/install/jluna/include/jluna/.src/safe_utilities.inl:44
ProxyValue at /home/frivold/kef_env/warm_dep_ws/build/jluna/../../src/jluna/.src/proxy.cpp:31
Proxy at /home/frivold/kef_env/warm_dep_ws/build/jluna/../../src/jluna/.src/proxy.cpp:106
safe_eval at /home/frivold/kef_env/warm_dep_ws/build/jluna/../../src/jluna/.src/module.cpp:67
operator() at /home/frivold/kef_env/kef_ws/src/jluna_wrapper/src/jluna_test.cpp:11
_M_invoke at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/std_function.h:300
operator() at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/std_function.h:688
operator() at /home/frivold/kef_env/warm_dep_ws/install/jluna/include/jluna/.src/multi_threading.inl:327
_M_invoke at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/std_function.h:285
#3 at ./none:813
unknown function (ip: 0x7f88f8d7453f)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1788 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:877
Allocations: 1620960 (Pool: 1620032; Big: 928); GC: 1

Less common crash output

terminate called after throwing an instance of 'jluna::JuliaException'
  what():  [JULIA][EXCEPTION] MethodError: no method matching create_reference(::UInt64)
Closest candidates are:
  create_reference(!Matched::Ptr{Nothing}) at none:546
  create_reference(!Matched::Nothing) at none:567
Stacktrace:
 [1] safe_call(f::Function, args::UInt64)
   @ Main.jluna ./none:18

signal (6): Aborted
in expression starting at none:0
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x7f16feea8a30)
unknown function (ip: 0x7f16feeb45db)
_ZSt9terminatev at /lib/x86_64-linux-gnu/libstdc++.so.6 (unknown line)
__clang_call_terminate at /home/frivold/kef_env/warm_dep_ws/install/jluna/lib/libjluna.so.0.9.1 (unknown line)
~ProxyValue at /home/frivold/kef_env/warm_dep_ws/build/jluna/../../src/jluna/.src/proxy.cpp:62
_M_dispose at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/shared_ptr_base.h:377
_M_release at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/shared_ptr_base.h:155 [inlined]
~__shared_count at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/shared_ptr_base.h:730 [inlined]
~__shared_ptr at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/shared_ptr_base.h:1169 [inlined]
reset at /usr/bin/../lib/gcc/x86_64-linux-gnu/9/../../../../include/c++/9/bits/shared_ptr_base.h:1287 [inlined]
~Proxy at /home/frivold/kef_env/warm_dep_ws/build/jluna/../../src/jluna/.src/proxy.cpp:111
main at /home/frivold/kef_env/kef_ws/src/jluna_wrapper/src/jluna_test.cpp:31
__libc_start_main at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
_start at /home/frivold/kef_env/kef_ws/src/jluna_wrapper/cmake-build-debug/jluna_test (unknown line)
Allocations: 2722 (Pool: 2712; Big: 10); GC: 0
Clemapfel commented 1 year ago

Module::safe_eval is not thread-safe in regards to the module instance, what's happening is that the internal proxy memory management systems gets out of sync because safe_eval modifies it concurrently to create the temporary proxies that are the result of safe_eval.

Note how in the docs, only functions marked with [thread safe] are, c.f. https://clemens-cords.com/jluna/list.html#module. Some module functions are, some are not.

This is intended behavior but I'll at least update the docs to make it clearer so I will leave this issue open.

By adding a lock for Main, it runs without issue:

 initialize(4);

    auto mutex = jluna::Mutex();

    auto lambda = [&]() {
        while(true) {
            mutex.lock();
            Main.safe_eval("@info \"lambda:\" Threads.threadid()");
            Main.safe_eval("sleep(1)");
            mutex.unlock();
        }
    };

    Task<void> t1 = ThreadPool::create<void()>(lambda);
    t1.schedule();

    while(true) {
        mutex.lock();
        Main.safe_eval("@info \"main:\" Threads.threadid()");
        Main.safe_eval("sleep(1)");
        mutex.unlock();
    }

    return 0;

Alternatively you could use unsafe::eval which isn't exactly thread-safe but since you are not modifying any objects including no C++-objects, it runs through

  auto lambda = [&]() {
        while(true) {

            "@info \"lambda:\" Threads.threadid()"_eval;
            "sleep(1)"_eval;
        }
    };

    Task<void> t1 = ThreadPool::create<void()>(lambda);
    t1.schedule();

    while(true) {
        "@info \"lambda:\" Threads.threadid()"_eval;
        "sleep(1)"_eval;
    }
paulerikf commented 1 year ago

Ahh, ok. That makes sense.

Thanks for the explanation.

Clemapfel commented 1 year ago

Docs updated by https://github.com/Clemapfel/jluna/commit/ae4767e8cbe1f7bcbf18f844427e9d78f556897c, I should've closed this once that commit was merged