Confusion about how sleeping main thread affects task execution

paulerikf commented 1 year ago

The multithreading docs show examples where sleep is called on the main c++ thread, but issues seem to arise in more complicated scenarios (i.e. if any calls to julia functions are made in the task code).

I've been trying to use jluna in a situation where I'm not fully in charge of the c++ thread. I use it to set up and schedule julia-side tasks, and then it sleeps and occasionally runs callbacks at intervals outside of my direct control.

In attempting to get this working I've become rather confused about how sleeping the main thread affects task execution.

Here's an example that hopefully illustrates what I'm seeing:

initialize(2);

auto lambda = [&]() {
    int i = 0;
    while(true) {
        // somehow print i
        i++;
    }
};

Task<void> t1 = ThreadPool::create<void()>(lambda);
t1.schedule();

while(true) {
    // somehow print "main_loop"
    // somehow sleep for 1s
}

Cases where sleep is done with std::this_thread::sleep_for(1000ms):

task and main thread use std::cout
- task runs nonstop
task uses Julia println, main thread uses std::cout
- task prints 0 then gets stuck forever
task and main thread use Julia println
- task prints one number for every time main thread prints "main_thread"

Cases where sleep is done using Julia sleep function:

task runs nonstop no matter how print is done.

Extended examples:

Task and main thread use std::cout:

initialize(2);

auto lambda = [&]() {
    int i = 0;
    while(true) {
        std::cout << i << std::endl;
        i++;
    }
};

Task<void> t1 = ThreadPool::create<void()>(lambda);
t1.schedule();

while(true) {
    std::cout << "main_loop" << std::endl;
    std::this_thread::sleep_for(1000ms);
}

Task uses jl println, main thread uses std::cout

    initialize(2);
    auto println_jl = Main.safe_eval("return println");

    auto lambda = [&]() {
        int i = 0;
        while(true) {
            println_jl.safe_call<void>(i);
            i++;
        }
    };

    Task<void> t1 = ThreadPool::create<void()>(lambda);
    t1.schedule();

    while(true) {
        std::cout << "main_loop\n" << std::endl;
        std::this_thread::sleep_for(1000ms);
    }

[JULIA][LOG] initialization successful (4 thread(s)).
main_loop

0main_loop

main_loop

main_loop

main_loop

...

Both task and main thread us jl println

    initialize(2);
    auto println_jl = Main.safe_eval("return println");

    auto lambda = [&]() {
        int i = 0;
        while(true) {
            println_jl.safe_call<void>(i);
            i++;
        }
    };

    Task<void> t1 = ThreadPool::create<void()>(lambda);
    t1.schedule();

    while(true) {
        println_jl.safe_call<void>("main_loop");
        std::this_thread::sleep_for(1000ms);
    }

[JULIA][LOG] initialization successful (4 thread(s)).
main_loop
0
1
main_loop
2
main_loop
3
main_loop
4
main_loop
5
...

paulerikf commented 1 year ago

I hope that made any sense at all... Please let me know if I should clarify something!

Clemapfel commented 1 year ago

Hey, sorry I took a week to get to this I was busy with day-work stuff.

First of all, Base.println is synchronized because Base.stdout is locked by default, I think the same is not true for std::cout, but either way both languages do not share a lock so the print output will interrupt each other only if you mix both (or use std::cout for both). For now I'll stick to stdout in the worker and main, which print is used has no impact on task scheduling.

I created this gist to test the behavior, essentially we define sleep in both languages:

// julia-side sleep
static auto sleep_jl = []() {
    static auto* sleep_jl = unsafe::get_function(Main, "sleep"_sym);
    unsafe::call(sleep_jl, box(0.2));
};

// cpp-side sleep
static auto sleep_cpp = [](){
    std::this_thread::sleep_for(std::chrono::milliseconds(300)); // different from jl time
};

Then We run your task setup, except we have both the worker and main print a single word so the one "main_task" does get lost in a sea of ints. In the gist, main (master) prints "main", the worker (task) prints "worker":

There's four cases, worker can have cpp- or jl-side sleep, master can have cpp- or jl-side sleep. Both only go through 5 iterations:

Worker: cpp | Master: cpp
main
worker
main
worker
main
worker
main
worker
main
worker

Worker: cpp | Master: jl
main
worker
main
worker
main
worker
main
main
worker
worker

Worker: jl | Master: jl
main
worker
main
main
main
main
worker
worker
worker
worker

Worker: jl | Master: cpp
main
worker
main
main
main
main
worker
worker
worker
worker

I think this reproduces the behavior you were trying to describe, if the worker uses jl-sleep, it only triggers one iteration, then main takes over.

I will investigate further and get back to you, I don't think this is a bug but just a quirk of how julia stalls C++.

Clemapfel commented 1 year ago

With more testing I have been able to get the cases where worker would stall to instead exhibit the expected concurrent behavior, I think this comes down to a cointoss of when exactly the first julia-side sleep is invoked, maybe if main is currently already sleeping, another sleep will deadlock worker because some internal thing never gets updated.

I will elevate this to low-priority bug, the fact this behavior seems random with a chance of happening more times than not makes me dread debugging this but I will look into it more.

For now, using std::this_thread::sleep appear to be safe both julia- and cpp-side. I still do not believe that using println vs std::cout has any effect on task scheduling.

Clemapfel commented 1 year ago

Regarding your original question:

In attempting to get this working I've become rather confused about how sleeping the main thread affects task execution.

As of now, I am of the opinion that if main is sleeping and a task issued by main (worker) invokes a julia-side sleep, then issues may arise. It seems that in that case, worker is unable to continue until main yields.

It may be possible that any of your functions trigger a sleep command implicitly, such as when a lock is interacted with when writing to a data structure.

I've been trying to use jluna in a situation where I'm not fully in charge of the c++ thread. I use it to set up and schedule julia-side tasks, and then it sleeps and occasionally runs callbacks at intervals outside of my direct control.

It may be best to adapt your architecture so all jluna does is trigger task creation completely Julia-side and limit the C++-side as much as possible. Because you are in a situation where you cannot control C++ master, it's best to air on the side of caution, which is staying Julia-side.

Clemapfel commented 1 year ago

Addressed by https://github.com/Clemapfel/jluna/commit/9b80f2f928d5096540bafa0829501a3c1c835c2d

Clemapfel / jluna

Confusion about how sleeping main thread affects task execution #33

Extended examples: