[BUG] Task is not shared between threads, but instead badly duplicated.

mcourteaux commented 3 months ago

Describe the bug

Forwarding the task function at L513: https://github.com/bshoshany/thread-pool/blob/097aa718f25d44315cadb80b407144ad455ee4f9/include/BS_thread_pool.hpp#L505-L519

and at L536: https://github.com/bshoshany/thread-pool/blob/097aa718f25d44315cadb80b407144ad455ee4f9/include/BS_thread_pool.hpp#L531-L540

Causes the captured objects to be forwarded (moved) out and are therefore unusable in later runs of the task.

Minimal non-working example

#include "BS_thread_pool.hpp"
#include <memory>

int main()
{
  BS::thread_pool p(2);

  std::shared_ptr<std::string> sptr = std::make_shared<std::string>();
  std::printf("Sequence:\n");
  p.detach_sequence(0, 8, [sptr](int index) {
    std::printf("  ptr %d: %p\n", index, sptr.get());
  });
  p.wait();

  std::printf("Loop:\n");
  p.detach_loop(0, 8, [sptr](int index) {
    std::printf("  ptr %d: %p\n", index, sptr.get());
  });
  p.wait();

  return 0;
}

Outputs:

Sequence:
  ptr 0: 0x61f2fd71d460
  ptr 1: (nil)
  ptr 2: (nil)
  ptr 4: (nil)
  ptr 5: (nil)
  ptr 6: (nil)
  ptr 7: (nil)
  ptr 3: (nil)
Loop:
  ptr 0: 0x61f2fd71d460
  ptr 1: 0x61f2fd71d460
  ptr 2: 0x61f2fd71d460
  ptr 3: 0x61f2fd71d460
  ptr 4: (nil)
  ptr 5: (nil)
  ptr 6: (nil)
  ptr 7: (nil)

bshoshany commented 3 months ago

This is not a bug. std::forward does not move the object (that's done with std::move), it just forwards the object whether it is an lvalue or rvalue. If you change sptr to &sptr in your code so it's captured by reference, then it will work as expected.

mcourteaux commented 3 months ago

That's not the fix.

#include "BS_thread_pool.hpp"
#include <memory>

int main()
{
  BS::thread_pool p(2);

  std::printf("Sequence:\n");
  {
    std::shared_ptr<std::string> sptr = std::make_shared<std::string>();
    p.detach_sequence(0, 8, [sptr](int index) {
      std::printf("  ptr %d: %p\n", index, sptr.get());
    });
  }
  p.wait();

  std::printf("Loop:\n");
  {
    std::shared_ptr<std::string> sptr = std::make_shared<std::string>();
    p.detach_loop(0, 8, [sptr](int index) {
      std::printf("  ptr %d: %p\n", index, sptr.get());
    });
  }
  p.wait();

  return 0;
}

Now you can't take it by &reference. If you disagree this is a bug, it should at least be documented. It's very non-straightforward that your lambda gets executed without the captured objects there. That's bananas IMHO.

bshoshany commented 3 months ago

I tried it and it still works just fine with &sptr. The shared pointer doesn't expire when the block ends, because that's the whole point of a shared pointer.

In any case, if you think this should be done differently, please let me know what you propose (just make sure it passes all the tests first). Thanks for opening this issue!

mcourteaux commented 3 months ago

The shared pointer doesn't expire when the block ends, because that's the whole point of a shared pointer.

It does. You seem to not fully understand how C++ works. I'm sorry to say this, as I know this is a bold statement to make from my side, but I fear it's true, or you were just confused for a minute. A shared_ptr is just an object that lives on the stack like any other object, but it uses a control-block which does live on the heap. The control-block doesn't expire, but the object does. In fact, because the object does expire, it decrements the ref-count for heap-lived object, and in this case, that was the only existing std::shared_ptr to the control-block, so the control-block gets deleted as well.

In any case, if you think this should be done differently, please let me know what you propose (just make sure it passes all the tests first). Thanks for opening this issue!

Well, I see two possibilities, but I'm not fully sure which one would be the best.

Do not copy the lambda object into the task-queue multiple times, but instead come up with a mechanism to share it, to better reflect what the API suggests.
Actually copy the lambda over multiple times (e.g. N-1 times for N instances and the last one can be forwarded if you want that optimization). As such all objects captured in the lambda closure will get copied, and thus will be available in all copies of the passed-in task.

I think my preference would go to approach 1, but I haven't fully considered the impact of sharing it. The good thing is that lambda-captured variables are by default marked as const, which should prevent people from doing bad things with it. Also, option 1 gets my preference because it better reflects what you did as a user: you made one lambda and passed it in, expecting it would get executed multiple times from potentially multiple threads. You as the user should be aware of the fact that this task will get executed concurrently, so it's your task to safely handle this concurrency either way.

mcourteaux commented 3 months ago

I tried it and it still works just fine with &sptr. The shared pointer doesn't expire when the block ends, because that's the whole point of a shared pointer.

To explicitly demonstrate what I'm saying, try the one below, and turn the capture into a &sptr capture-by-reference.

#include "BS_thread_pool.hpp"
#include <memory>

struct Demonstrator {
  std::string name;
  Demonstrator(std::string name) : name(name) {
    std::printf("Make demonstrator: %s\n", name.c_str());
  }

  ~Demonstrator() {
    std::printf("Destroy demonstrator: %s\n", name.c_str());
  }
};

int main()
{
  BS::thread_pool p(2);

  std::printf("Sequence:\n");
  {
    std::shared_ptr<Demonstrator> sptr = std::make_shared<Demonstrator>("seq");
    p.detach_sequence(0, 8, [sptr](int index) {
      std::printf("  ptr %d: %p\n", index, sptr.get());
    });
  }
  p.wait();

  std::printf("Loop:\n");
  {
    std::shared_ptr<Demonstrator> sptr = std::make_shared<Demonstrator>("loop");
    p.detach_loop(0, 8, [sptr](int index) {
      std::printf("  ptr %d: %p\n", index, sptr.get());
    });
  }
  p.wait();

  return 0;
}

mcourteaux commented 3 months ago

Same bug as #125.

bshoshany commented 3 months ago

My apologies, you're right, I was being imprecise about how shared pointers work. I'm about to administer an exam in 2 hours, so I'm a bit distracted :sweat_smile:

On my computer, capturing the shared pointer by reference does seem to solve the problem even in your last example, but to be honest, I'm not sure why - because it does clearly indicate that the object is destructed before the tasks run. I'm curious as to why that happens - perhaps because the object actually does remain in memory as long as it isn't explicitly overwritten by something else. But I agree that it's not a viable solution here.

Since you linked to #125, note that my solution there works here as well: just define the lambda as a proper object in the same scope as the shared pointer, rather than as a temporary object. That is, replace

p.detach_sequence(0, 8,
    [sptr](int index)
    {
        std::printf("  ptr %d: %p\n", index, sptr.get());
    });

with

auto task = [sptr](int index)
{
    std::printf("  ptr %d: %p\n", index, sptr.get());
};
p.detach_sequence(0, 8, task);

If that is not a viable solution for you, please let me know and I'll think about the suggestions you proposed above.

mcourteaux commented 3 months ago

perhaps because the object actually does remain in memory as long as it isn't explicitly overwritten by something else.

That's exactly why. This setup is still illegal C++ which falls under the "undefined behavior (UB)" umbrella. Since you're curious, I can modify the example to show a scenario under which this UB "fails" more explicitly. Consider this sequence, where you explicitly try to reuse stack space for another shared_ptr afterwards:

#include "BS_thread_pool.hpp"
#include <memory>

struct Demonstrator {
  std::string name;
  Demonstrator(std::string name) : name(name) {
    std::printf("Make demonstrator: %s\n", name.c_str());
  }

  ~Demonstrator() {
    std::printf("Destroy demonstrator: %s\n", name.c_str());
  }
};

int main()
{
  BS::thread_pool p(2);

  std::printf("Sequence:\n");
  {
    std::shared_ptr<Demonstrator> sptr = std::make_shared<Demonstrator>("seq");
    std::printf("shared_ptr points to: %p\n", sptr.get());
    p.detach_sequence(0, 8, [&sptr](int index) {
        std::this_thread::sleep_for(std::chrono::milliseconds(1));
      std::printf("  ptr %d: %p\n", index, sptr.get());
    });
  }

  /* Let's try to get the compiler to reuse the same stack space. */
  {
    std::shared_ptr<int> i = std::make_shared<int>(0);
    std::printf("new shared_ptr: %p\n", i.get());
  }

  p.wait();

  std::printf("Loop:\n");
  {
    std::shared_ptr<Demonstrator> sptr = std::make_shared<Demonstrator>("loop");
    std::printf("shared_ptr points to: %p\n", sptr.get());
    p.detach_loop(0, 8, [&sptr](int index) {
      std::printf("  ptr %d: %p\n", index, sptr.get());
    });
  }
  p.wait();

  return 0;
}

Compiling this with gcc -O0 yields this output:

❯ g++ -O0 bug.cpp -o bug

❯ ./bug
Sequence:
Make demonstrator: seq
shared_ptr points to: 0x631eaa235870
Destroy demonstrator: seq
new shared_ptr: 0x631eaa2358b0
  ptr 0: 0x631eaa2358b0
  ptr 1: 0x631eaa2358b0
  ptr 2: 0x631eaa2358b0
  ptr 3: 0x631eaa2358b0
  ptr 4: 0x631eaa2358b0
  ptr 5: 0x631eaa2358b0
  ptr 6: 0x631eaa2358b0
  ptr 7: 0x631eaa2358b0
Loop:
Make demonstrator: loop
shared_ptr points to: 0x631eaa235870
Destroy demonstrator: loop
  ptr 4: 0x631eaa235870
  ptr 5: 0x631eaa235870
  ptr 6: 0x631eaa235870
  ptr 0: 0x631eaa235870
  ptr 1: 0x631eaa235870
  ptr 2: 0x631eaa235870
  ptr 3: 0x631eaa235870
  ptr 7: 0x631eaa235870

You can see how the address now is overwritten due to the other shared_ptr reusing the same space on the stack. Interestingly, I wasn't able to immediately reproduce this under -O1 and -O2.

I'll think a bit later on what might be the best way to go forward with this, in approach 1 I mentioned earlier. I might try a few things, and report back later.

Either way; thanks for being patient and respectful here. I see you're a physics professor, which is really cool! It's very understandable these technicalities are not entirely clear. Please reopen this issue meanwhile.

bshoshany commented 3 months ago

Thanks for the new example, this does seem to confirm my guess. Now that I'm finally done with the test (plus grading, etc...) I've had some time to think about this more thoroughly.

Here is my own test program:

#include "BS_thread_pool.hpp"
#include "BS_thread_pool_utils.hpp"
#include <chrono>
#include <ios>
#include <memory>
#include <thread>

BS::synced_stream sync_out;
bool object_exists = false;

class test
{
public:
    test()
    {
        object_exists = true;
        sync_out.println("I was constructed!");
    };

    ~test()
    {
        object_exists = false;
        sync_out.println("I was destructed!");
    };
};

int main()
{
    BS::thread_pool pool(1);
    sync_out.print(std::boolalpha);
    {
        std::shared_ptr<test> ptr = std::make_shared<test>();
        pool.detach_sequence(0, 3,
            [ptr](const int idx)
            {
                std::this_thread::sleep_for(std::chrono::milliseconds(100));
                sync_out.println("Task ", idx, " executed, object exists: ", object_exists, ", pointer points to: ", ptr);
            });
    }
    pool.wait();
}

Here I am using detach_sequence() since it is the simplest way to enqueue multiple tasks at once, but any solutions will directly translate to _loop, _blocks, and submit_ variants. The output is indeed not what I would have expected:

I was constructed!
Task 0 executed, object exists: true, pointer points to: 0x6b9ec0
I was destructed!
Task 1 executed, object exists: false, pointer points to: 0
Task 2 executed, object exists: false, pointer points to: 0

One way to fix it, as I mentioned above, is to create the lambda as a non-temporary object, that is:

auto task = [ptr](const int idx)
{
    std::this_thread::sleep_for(std::chrono::milliseconds(100));
    sync_out.println("Task ", idx, " executed, object exists: ", object_exists, ", pointer points to: ", ptr);
};
pool.detach_sequence(0, 3, task);

Now the output is as expected:

I was constructed!
Task 0 executed, object exists: true, pointer points to: 0x73b620
Task 1 executed, object exists: true, pointer points to: 0x73b620
Task 2 executed, object exists: true, pointer points to: 0x73b620
I was destructed!

Another way to fix it is to declare the std::shared_ptr as const, which results in the same expected output.

As you mentioned in your first post, the std::forward that is being applied to the lambda seems to be the culprit. If I understand correctly, if the lambda is an rvalue, then the captured shared pointer is treated as an rvalue too, so it is moved when the lambda is forwarded, which causes the reference count to go to zero and the object to be destructed. This is avoided by either defining the lambda as an lvalue, or declaring the shared pointer as const, since in both cases it will not be moved.

In detach_sequence(), if I change sequence = std::forward<F>(sequence) to just sequence in the lambda capture, the output is as expected, and this fixes your bug. Now I'm just trying to remember what the reason was to have forwarding there in the first place. All my tests seem to work just fine if the forwarding is removed, so it seems like the best solution is to simply remove the forwarding.

I will do this in the next release, but it doesn't seem to be urgent since there are two easy fixes on the user side (an lvalue lambda or a const shared pointer), so I will probably first finish implementing a few additional new features (which I am currently in the middle of working on) and release this bug fix along with the new features. However, if you believe this is urgent, please let me know and I will be happy to release a patch with just this bug fix.

Meanwhile, I am keeping this issue closed since I believe the bug is fixed by removing the forwarding. Thanks again for opening this issue, this was extremely helpful!

mcourteaux commented 3 months ago

Another way to fix it is to declare the std::shared_ptr as const, which results in the same expected output.

Making it const indeed seems to make me unable to reproduce the issue. The reason is rather subtle. The std::forward cannot move the std::shared_ptr from the lambda-capture to the new copy of lambda, because the source is const, and thus cannot be destroyed when moving. So the copy constructor gets selected instead. Interesting workaround!

In detach_sequence(), if I change sequence = std::forward<F>(sequence) to just sequence in the lambda capture, the output is as expected, and this fixes your bug.

Indeed, but this might have an impact on performance, as now the captured object will be copied as many times as iterations on the sequence. That's why I was talking about the "first approach" where you'd share the lambda between threads, instead of copying it. An easy implementation would be to std::move the lambda over to a std::shared_ptr<std::function<void>()> and wrap it again into a std::function<void()> which is the task on the pool. A lot of boiler plate, and I'm not really a fan of the fact that now a std::shared_ptr with atomic operations is in between to do an extra refcount, but maybe that's just fine...

bshoshany commented 3 months ago

Thanks for the comments! In my experience, shared pointers can have an impact on performance; in fact, for the next release I'm trying to figure out how to get rid of the shared pointer in submit_task(). So I'm not sure about that solution. Give me a few weeks to run some tests.

bshoshany / thread-pool

[BUG] Task is not shared between threads, but instead badly duplicated. #149