Clemapfel / jluna

Julia Wrapper for C++ with Focus on Safety, Elegance, and Ease of Use
https://clemens-cords.com/jluna
MIT License
244 stars 12 forks source link

Using multi-threaded Jluna interface ends up with an error sometimes #69

Open kouchy opened 3 weeks ago

kouchy commented 3 weeks ago

Hi @Clemapfel,

I am trying to use Jluna to support tasks written in Julia into a streaming library I'm working on (StreamPU).

I have the following minimal code:

#include <jluna.hpp>

int main(int argc, char** argv)
{
    jluna::initialize(3);

    const size_t n_tasks = 12;
    std::vector<jluna::Task<void>> tasks;
    std::function<void(const size_t)> func_exec = [](const size_t tid)
    {
        jluna::Base["println"]("lambda called with ", tid);
    };

    for (size_t tid = 0; tid < n_tasks; tid++)
    {
         tasks.push_back(jluna::ThreadPool::create(func_exec, tid));
         tasks.back().schedule();
    }
    for (size_t tid = 0; tid < n_tasks; tid++)
        tasks[tid].join();

    return 0;
}

Most of the time this code will print something like:

[JULIA][LOG] initialization successful (3 thread(s)).
lambda called with 5
lambda called with 6
lambda called with 8
lambda called with 0
lambda called with 7
lambda called with 1
lambda called with 2
lambda called with 10
lambda called with 9
lambda called with 4
lambda called with 3
lambda called with 11

But sometimes it failes with the following error:

[JULIA][LOG] initialization successful (3 thread(s)).
^[[Aterminate called after throwing an instance of 'jluna::JuliaException'
  what():  [JULIA][EXCEPTION] KeyError: key 0x0000773229fd79a0 not found
Stacktrace:
 [1] getindex
   @ ./dict.jl:498 [inlined]
 [2] get_reference(key::UInt64)
   @ Main.jluna.memory_handler ./none:597
 [3] safe_call(f::Function, args::UInt64)
   @ Main.jluna ./none:17
 [4] (::Main.jluna.cppcall.var"#3#4"{UInt64})()
   @ Main.jluna.cppcall ./none:828

[217830] signal (6.-6): Aborted
in expression starting at none:0
pthread_kill at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
gsignal at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x77323dca5ffd)
unknown function (ip: 0x77323dcbae9b)
_ZSt9terminatev at /lib/x86_64-linux-gnu/libstdc++.so.6 (unknown line)
__cxa_throw at /lib/x86_64-linux-gnu/libstdc++.so.6 (unknown line)
safe_call<_jl_value_t*> at /nfs/users/cassagnea-nfs/softwares/jluna/include/jluna/.src/safe_utilities.inl:56
_ZN5jluna6detail13get_referenceEm at /nfs/users/cassagnea-nfs/softwares/jluna/lib/libjluna.so.1.0.0 (unknown line)
_ZN5jluna5Proxy10ProxyValueC2EP11_jl_value_tRSt10shared_ptrIS1_ES3_ at /nfs/users/cassagnea-nfs/softwares/jluna/lib/libjluna.so.1.0.0 (unknown line)
_ZN5jluna5ProxyC2EP11_jl_value_tRSt10shared_ptrINS0_10ProxyValueEES2_ at /nfs/users/cassagnea-nfs/softwares/jluna/lib/libjluna.so.1.0.0 (unknown line)
operator[]<char> at /nfs/users/cassagnea-nfs/softwares/jluna/include/jluna/.src/proxy.inl:78 [inlined]
operator() at /nfs/users/cassagnea-nfs/workspace/devel/streampu_julia/tests/julia/simple_chain_julia.cpp:75 [inlined]
__invoke_impl<void, main(int, char**)::<lambda(size_t)>&, long unsigned int> at /usr/include/c++/13/bits/invoke.h:61 [inlined]
__invoke_r<void, main(int, char**)::<lambda(size_t)>&, long unsigned int> at /usr/include/c++/13/bits/invoke.h:111 [inlined]
_M_invoke at /usr/include/c++/13/bits/std_function.h:290
operator() at /usr/include/c++/13/bits/std_function.h:591 [inlined]
operator() at /nfs/users/cassagnea-nfs/softwares/jluna/include/jluna/.src/multi_threading.inl:345 [inlined]
__invoke_impl<_jl_value_t*, jluna::ThreadPool::create<long unsigned int>(const std::function<void(long unsigned int)>&, long unsigned int)::<lambda()>&> at /usr/include/c++/13/bits/invoke.h:61 [inlined]
__invoke_r<_jl_value_t*, jluna::ThreadPool::create<long unsigned int>(const std::function<void(long unsigned int)>&, long unsigned int)::<lambda()>&> at /usr/include/c++/13/bits/invoke.h:114 [inlined]
_M_invoke at /usr/include/c++/13/bits/std_function.h:290
_ZNKSt8functionIFP11_jl_value_tvEEclEv at /nfs/users/cassagnea-nfs/softwares/jluna/libjluna.so (unknown line)
jluna_invoke_from_task at /nfs/users/cassagnea-nfs/softwares/jluna/libjluna.so (unknown line)
#3 at ./none:828
unknown function (ip: 0x77323d468737)
_jl_invoke at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:2895 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/gf.c:3077
jl_apply at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
start_task at /cache/build/builder-amdci4-0/julialang/julia-release-1-dot-10/src/task.c:1238
Allocations: 2908 (Pool: 2899; Big: 9); GC: 0
Aborted (core dumped)

I am using a Zen 4 CPU with 8 cores on Ubuntu 24.04. I compiled Jluna from this repository (7b08c4f1fc1392964b1c1e00a313cfda774608f4) and my Julia version is 1.10.4.

Do you have any ideas why this error happens? Should I make things differently? It worth mentioning that I'm not very experienced in Julia programming.

Many thanks in advance for any help.

kouchy commented 3 weeks ago

I noticed that if you increase the number of created tasks from 12 to 200 or more, the error occurs significantly more often.

k12Sergey commented 2 weeks ago

I've got similar problem on Windows10 with mingw compiler and last julia version with the following code (its never worked till the end)

double juliaTest()
{
    using namespace jluna;

    std::function<Int64(Int64)> labdaCall = [](Int64 value){
        auto f = jluna::Main["f"];
        auto res = f(value);
        return res;
    };

    Int64 val = 5;

    auto task1 = ThreadPool::create(labdaCall, val);
    auto task2 = ThreadPool::create(labdaCall, val);
    auto task3 = ThreadPool::create(labdaCall, val);

    // start task
    task1.schedule();
    task2.schedule();
    task3.schedule();

    // // wait for task to finish
    task1.join();
    task2.join();
    task3.join();

    int res = task1.result().get().value() + task2.result().get().value() + task3.result().get().value();

    return res;
}
int main()
{
    jluna::initialize(8);
    /// declare function
    jluna::Main.safe_eval(
        R"(function f(x)
            return x*x
           end )");

    for(int i = 0; i != 100; ++i) {
        auto res = juliaTest();
        std::cout << res << std::endl;
    }

    return 0;
}

Code output

[JULIA][LOG] initialization successful (8 thread(s)). 75 75 75 75 75

Error message

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: UNKNOWN at 0x7ffb40e7cf19 -- RaiseException at C:\WINDOWS\System32\KERNELBASE.dll (unknown line)
in expression starting at none:1
RaiseException at C:\WINDOWS\System32\KERNELBASE.dll (unknown line)
_Unwind_RaiseException at /workspace/srcdir/gcc-13.2.0/libgcc\unwind-seh.c:334
__cxa_throw at /workspace/srcdir/gcc-13.2.0/libstdc++-v3/libsupc++\eh_throw.cc:93
safe_call<_jl_value_t*, _jl_value_t*> at C:/Program Files (x86)/jluna/include/jluna/.src\safe_utilities.inl:56
safe_call<long long int&> at C:/Program Files (x86)/jluna/include/jluna/.src\proxy.inl:93
operator()<long long int&> at C:/Program Files (x86)/jluna/include/jluna/.src\proxy.inl:115
operator() at D:/Work/julia_tests/jluna_multithread\main.cpp:14
__invoke_impl<jluna::Proxy, juliaTest()::<lambda(jluna::Int64)>&, long long int> at D:/Qt/Tools/mingw1120_64/lib/gcc/x86_64-w64-mingw32/11.2.0/include/c++/bits\invoke.h:61
__invoke_r<long long int, juliaTest()::<lambda(jluna::Int64)>&, long long int> at D:/Qt/Tools/mingw1120_64/lib/gcc/x86_64-w64-mingw32/11.2.0/include/c++/bits\invoke.h:116
_M_invoke at D:/Qt/Tools/mingw1120_64/lib/gcc/x86_64-w64-mingw32/11.2.0/include/c++/bits\std_function.h:291
operator() at D:/Qt/Tools/mingw1120_64/lib/gcc/x86_64-w64-mingw32/11.2.0/include/c++/bits\std_function.h:560
operator() at C:/Program Files (x86)/jluna/include/jluna/.src\multi_threading.inl:367
__invoke_impl<_jl_value_t*, jluna::ThreadPool::create<long long int, long long int>(const std::function<long long int(long long int)>&, long long int)::<lambda()>&> at D:/Qt/Tools/mingw1120_64/lib/gcc/x86_64-w64-mingw32/11.2.0/include/c++/bits\invoke.h:61
__invoke_r<_jl_value_t*, jluna::ThreadPool::create<long long int, long long int>(const std::function<long long int(long long int)>&, long long int)::<lambda()>&> at D:/Qt/Tools/mingw1120_64/lib/gcc/x86_64-w64-mingw32/11.2.0/include/c++/bits\invoke.h:114
_M_invoke at D:/Qt/Tools/mingw1120_64/lib/gcc/x86_64-w64-mingw32/11.2.0/include/c++/bits\std_function.h:291
#3 at .\none:828
unknown function (ip: 000002ab38f89f4b)
jl_apply at C:/workdir/src\julia.h:1982 [inlined]
start_task at C:/workdir/src\task.c:1238
Allocations: 2909 (Pool: 2900; Big: 9); GC: 0

or this with some stack

[28716] signal (22): SIGABRT
in expression starting at none:0
crt_sig_handler at C:/workdir/src\signals-win.c:95
raise at C:\WINDOWS\System32\msvcrt.dll (unknown line)
abort at C:\WINDOWS\System32\msvcrt.dll (unknown line)
__verbose_terminate_handler at /workspace/srcdir/gcc-13.2.0/libstdc++-v3/libsupc++\vterminate.cc:95
__terminate at /workspace/srcdir/gcc-13.2.0/libstdc++-v3/libsupc++\eh_terminate.cc:48
__cxa_call_terminate at /workspace/srcdir/gcc-13.2.0/libstdc++-v3/libsupc++\eh_call.cc:54
__gxx_personality_imp at /workspace/srcdir/gcc-13.2.0/libstdc++-v3/libsupc++\eh_personality.cc:688
_GCC_specific_handler at /workspace/srcdir/gcc-13.2.0/libgcc\unwind-seh.c:300
__gxx_personality_seh0 at /workspace/srcdir/gcc-13.2.0/libstdc++-v3/libsupc++\eh_personality.cc:810
_chkstk at C:\WINDOWS\SYSTEM32\ntdll.dll (unknown line)
RtlRaiseException at C:\WINDOWS\SYSTEM32\ntdll.dll (unknown line)
RtlRaiseException at C:\WINDOWS\SYSTEM32\ntdll.dll (unknown line)
RaiseException at C:\WINDOWS\System32\KERNELBASE.dll (unknown line)
_Unwind_RaiseException at /workspace/srcdir/gcc-13.2.0/libgcc\unwind-seh.c:334
__cxa_throw at /workspace/srcdir/gcc-13.2.0/libstdc++-v3/libsupc++\eh_throw.cc:93
_ZN5jluna9safe_callIJP11_jl_value_tEEES2_S2_DpT_ at C:\Program Files (x86)\jluna\bin\libjluna.dll (unknown line)
_ZNSt15_Sp_counted_ptrIPN5jluna5Proxy10ProxyValueELN9__gnu_cxx12_Lock_policyE2EE10_M_disposeEv at C:\Program Files (x86)\jluna\bin\libjluna.dll (unknown line)
_ZN5jluna5ProxyD1Ev at C:\Program Files (x86)\jluna\bin\libjluna.dll (unknown line)
operator() at D:/Work/julia_tests/jluna_multithread\main.cpp:16
__invoke_impl<long long int, juliaTest()::<lambda(jluna::Int64)>&, long long int> at D:/Qt/Tools/mingw1120_64/lib/gcc/x86_64-w64-mingw32/11.2.0/include/c++/bits\invoke.h:61
__invoke_r<long long int, juliaTest()::<lambda(jluna::Int64)>&, long long int> at D:/Qt/Tools/mingw1120_64/lib/gcc/x86_64-w64-mingw32/11.2.0/include/c++/bits\invoke.h:114
_M_invoke at D:/Qt/Tools/mingw1120_64/lib/gcc/x86_64-w64-mingw32/11.2.0/include/c++/bits\std_function.h:291
operator() at D:/Qt/Tools/mingw1120_64/lib/gcc/x86_64-w64-mingw32/11.2.0/include/c++/bits\std_function.h:560
operator() at C:/Program Files (x86)/jluna/include/jluna/.src\multi_threading.inl:367
__invoke_impl<_jl_value_t*, jluna::ThreadPool::create<long long int, long long int>(const std::function<long long int(long long int)>&, long long int)::<lambda()>&> at D:/Qt/Tools/mingw1120_64/lib/gcc/x86_64-w64-mingw32/11.2.0/include/c++/bits\invoke.h:61
__invoke_r<_jl_value_t*, jluna::ThreadPool::create<long long int, long long int>(const std::function<long long int(long long int)>&, long long int)::<lambda()>&> at D:/Qt/Tools/mingw1120_64/lib/gcc/x86_64-w64-mingw32/11.2.0/include/c++/bits\invoke.h:114
_M_invoke at D:/Qt/Tools/mingw1120_64/lib/gcc/x86_64-w64-mingw32/11.2.0/include/c++/bits\std_function.h:291
#3 at .\none:828
unknown function (ip: 0000023e836c9dab)
jl_apply at C:/workdir/src\julia.h:1982 [inlined]
start_task at C:/workdir/src\task.c:1238
Allocations: 2909 (Pool: 2900; Big: 9); GC: 0
terminate called after throwing an instance of 'jluna::JuliaException'
  what():  [JULIA][EXCEPTION] KeyError: key 0x0000000000000102 not found
Stacktrace:
 [1] getindex
   @ .\dict.jl:498 [inlined]
 [2] free_reference(key::UInt64)
   @ Main.jluna.memory_handler .\none:615
 [3] safe_call(f::Function, args::UInt64)
   @ Main.jluna .\none:17
 [4] (::Main.jluna.cppcall.var"#3#4"{UInt64})()
   @ Main.jluna.cppcall .\none:828
k12Sergey commented 2 weeks ago

Note that run a single Task in my code work correct.

But run juliaTest() with a single task

double juliaTest()
{
    using namespace jluna;

    std::function<Int64(Int64)> labdaCall =
    [](Int64 value){
        auto res = jluna::Main["f"](value);
        return value;
    };

    Int64 val = 5;

    auto task1 = ThreadPool::create(labdaCall, val);

    // start task
    task1.schedule();

    task1.join();

    return 1;
}

int main()
{
    jluna::initialize(5);
    /// declare function
    jluna::Main.safe_eval(
        R"(f(x) = x*x)");

    for(int i = 0; i != 10000; ++i) {
        std::cout << i << std::endl;
        std::jthread t1(&juliaTest);
    }

    return 0;
}

called in jthread crash with the following error

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x7ffa5a78373d -- ijl_excstack_state at C:/workdir/src\rtutils.c:307
in expression starting at none:0
Allocations: 2909 (Pool: 2900; Big: 9); GC: 0