Closed mcourteaux closed 3 months ago
This is not a bug. std::forward
does not move the object (that's done with std::move
), it just forwards the object whether it is an lvalue or rvalue. If you change sptr
to &sptr
in your code so it's captured by reference, then it will work as expected.
That's not the fix.
#include "BS_thread_pool.hpp"
#include <memory>
int main()
{
BS::thread_pool p(2);
std::printf("Sequence:\n");
{
std::shared_ptr<std::string> sptr = std::make_shared<std::string>();
p.detach_sequence(0, 8, [sptr](int index) {
std::printf(" ptr %d: %p\n", index, sptr.get());
});
}
p.wait();
std::printf("Loop:\n");
{
std::shared_ptr<std::string> sptr = std::make_shared<std::string>();
p.detach_loop(0, 8, [sptr](int index) {
std::printf(" ptr %d: %p\n", index, sptr.get());
});
}
p.wait();
return 0;
}
Now you can't take it by &
reference. If you disagree this is a bug, it should at least be documented. It's very non-straightforward that your lambda gets executed without the captured objects there. That's bananas IMHO.
I tried it and it still works just fine with &sptr
. The shared pointer doesn't expire when the block ends, because that's the whole point of a shared pointer.
In any case, if you think this should be done differently, please let me know what you propose (just make sure it passes all the tests first). Thanks for opening this issue!
The shared pointer doesn't expire when the block ends, because that's the whole point of a shared pointer.
It does. You seem to not fully understand how C++ works. I'm sorry to say this, as I know this is a bold statement to make from my side, but I fear it's true, or you were just confused for a minute. A shared_ptr
is just an object that lives on the stack like any other object, but it uses a control-block which does live on the heap. The control-block doesn't expire, but the object does. In fact, because the object does expire, it decrements the ref-count for heap-lived object, and in this case, that was the only existing std::shared_ptr
to the control-block, so the control-block gets deleted as well.
In any case, if you think this should be done differently, please let me know what you propose (just make sure it passes all the tests first). Thanks for opening this issue!
Well, I see two possibilities, but I'm not fully sure which one would be the best.
I think my preference would go to approach 1, but I haven't fully considered the impact of sharing it. The good thing is that lambda-captured variables are by default marked as const
, which should prevent people from doing bad things with it. Also, option 1 gets my preference because it better reflects what you did as a user: you made one lambda and passed it in, expecting it would get executed multiple times from potentially multiple threads. You as the user should be aware of the fact that this task will get executed concurrently, so it's your task to safely handle this concurrency either way.
I tried it and it still works just fine with
&sptr
. The shared pointer doesn't expire when the block ends, because that's the whole point of a shared pointer.
To explicitly demonstrate what I'm saying, try the one below, and turn the capture into a &sptr
capture-by-reference.
#include "BS_thread_pool.hpp"
#include <memory>
struct Demonstrator {
std::string name;
Demonstrator(std::string name) : name(name) {
std::printf("Make demonstrator: %s\n", name.c_str());
}
~Demonstrator() {
std::printf("Destroy demonstrator: %s\n", name.c_str());
}
};
int main()
{
BS::thread_pool p(2);
std::printf("Sequence:\n");
{
std::shared_ptr<Demonstrator> sptr = std::make_shared<Demonstrator>("seq");
p.detach_sequence(0, 8, [sptr](int index) {
std::printf(" ptr %d: %p\n", index, sptr.get());
});
}
p.wait();
std::printf("Loop:\n");
{
std::shared_ptr<Demonstrator> sptr = std::make_shared<Demonstrator>("loop");
p.detach_loop(0, 8, [sptr](int index) {
std::printf(" ptr %d: %p\n", index, sptr.get());
});
}
p.wait();
return 0;
}
Same bug as #125.
My apologies, you're right, I was being imprecise about how shared pointers work. I'm about to administer an exam in 2 hours, so I'm a bit distracted :sweat_smile:
On my computer, capturing the shared pointer by reference does seem to solve the problem even in your last example, but to be honest, I'm not sure why - because it does clearly indicate that the object is destructed before the tasks run. I'm curious as to why that happens - perhaps because the object actually does remain in memory as long as it isn't explicitly overwritten by something else. But I agree that it's not a viable solution here.
Since you linked to #125, note that my solution there works here as well: just define the lambda as a proper object in the same scope as the shared pointer, rather than as a temporary object. That is, replace
p.detach_sequence(0, 8,
[sptr](int index)
{
std::printf(" ptr %d: %p\n", index, sptr.get());
});
with
auto task = [sptr](int index)
{
std::printf(" ptr %d: %p\n", index, sptr.get());
};
p.detach_sequence(0, 8, task);
If that is not a viable solution for you, please let me know and I'll think about the suggestions you proposed above.
perhaps because the object actually does remain in memory as long as it isn't explicitly overwritten by something else.
That's exactly why. This setup is still illegal C++ which falls under the "undefined behavior (UB)" umbrella. Since you're curious, I can modify the example to show a scenario under which this UB "fails" more explicitly. Consider this sequence, where you explicitly try to reuse stack space for another shared_ptr afterwards:
#include "BS_thread_pool.hpp"
#include <memory>
struct Demonstrator {
std::string name;
Demonstrator(std::string name) : name(name) {
std::printf("Make demonstrator: %s\n", name.c_str());
}
~Demonstrator() {
std::printf("Destroy demonstrator: %s\n", name.c_str());
}
};
int main()
{
BS::thread_pool p(2);
std::printf("Sequence:\n");
{
std::shared_ptr<Demonstrator> sptr = std::make_shared<Demonstrator>("seq");
std::printf("shared_ptr points to: %p\n", sptr.get());
p.detach_sequence(0, 8, [&sptr](int index) {
std::this_thread::sleep_for(std::chrono::milliseconds(1));
std::printf(" ptr %d: %p\n", index, sptr.get());
});
}
/* Let's try to get the compiler to reuse the same stack space. */
{
std::shared_ptr<int> i = std::make_shared<int>(0);
std::printf("new shared_ptr: %p\n", i.get());
}
p.wait();
std::printf("Loop:\n");
{
std::shared_ptr<Demonstrator> sptr = std::make_shared<Demonstrator>("loop");
std::printf("shared_ptr points to: %p\n", sptr.get());
p.detach_loop(0, 8, [&sptr](int index) {
std::printf(" ptr %d: %p\n", index, sptr.get());
});
}
p.wait();
return 0;
}
Compiling this with gcc -O0
yields this output:
❯ g++ -O0 bug.cpp -o bug
❯ ./bug
Sequence:
Make demonstrator: seq
shared_ptr points to: 0x631eaa235870
Destroy demonstrator: seq
new shared_ptr: 0x631eaa2358b0
ptr 0: 0x631eaa2358b0
ptr 1: 0x631eaa2358b0
ptr 2: 0x631eaa2358b0
ptr 3: 0x631eaa2358b0
ptr 4: 0x631eaa2358b0
ptr 5: 0x631eaa2358b0
ptr 6: 0x631eaa2358b0
ptr 7: 0x631eaa2358b0
Loop:
Make demonstrator: loop
shared_ptr points to: 0x631eaa235870
Destroy demonstrator: loop
ptr 4: 0x631eaa235870
ptr 5: 0x631eaa235870
ptr 6: 0x631eaa235870
ptr 0: 0x631eaa235870
ptr 1: 0x631eaa235870
ptr 2: 0x631eaa235870
ptr 3: 0x631eaa235870
ptr 7: 0x631eaa235870
You can see how the address now is overwritten due to the other shared_ptr reusing the same space on the stack. Interestingly, I wasn't able to immediately reproduce this under -O1
and -O2
.
I'll think a bit later on what might be the best way to go forward with this, in approach 1 I mentioned earlier. I might try a few things, and report back later.
Either way; thanks for being patient and respectful here. I see you're a physics professor, which is really cool! It's very understandable these technicalities are not entirely clear. Please reopen this issue meanwhile.
Thanks for the new example, this does seem to confirm my guess. Now that I'm finally done with the test (plus grading, etc...) I've had some time to think about this more thoroughly.
Here is my own test program:
#include "BS_thread_pool.hpp"
#include "BS_thread_pool_utils.hpp"
#include <chrono>
#include <ios>
#include <memory>
#include <thread>
BS::synced_stream sync_out;
bool object_exists = false;
class test
{
public:
test()
{
object_exists = true;
sync_out.println("I was constructed!");
};
~test()
{
object_exists = false;
sync_out.println("I was destructed!");
};
};
int main()
{
BS::thread_pool pool(1);
sync_out.print(std::boolalpha);
{
std::shared_ptr<test> ptr = std::make_shared<test>();
pool.detach_sequence(0, 3,
[ptr](const int idx)
{
std::this_thread::sleep_for(std::chrono::milliseconds(100));
sync_out.println("Task ", idx, " executed, object exists: ", object_exists, ", pointer points to: ", ptr);
});
}
pool.wait();
}
Here I am using detach_sequence()
since it is the simplest way to enqueue multiple tasks at once, but any solutions will directly translate to _loop
, _blocks
, and submit_
variants. The output is indeed not what I would have expected:
I was constructed!
Task 0 executed, object exists: true, pointer points to: 0x6b9ec0
I was destructed!
Task 1 executed, object exists: false, pointer points to: 0
Task 2 executed, object exists: false, pointer points to: 0
One way to fix it, as I mentioned above, is to create the lambda as a non-temporary object, that is:
auto task = [ptr](const int idx)
{
std::this_thread::sleep_for(std::chrono::milliseconds(100));
sync_out.println("Task ", idx, " executed, object exists: ", object_exists, ", pointer points to: ", ptr);
};
pool.detach_sequence(0, 3, task);
Now the output is as expected:
I was constructed!
Task 0 executed, object exists: true, pointer points to: 0x73b620
Task 1 executed, object exists: true, pointer points to: 0x73b620
Task 2 executed, object exists: true, pointer points to: 0x73b620
I was destructed!
Another way to fix it is to declare the std::shared_ptr
as const
, which results in the same expected output.
As you mentioned in your first post, the std::forward
that is being applied to the lambda seems to be the culprit. If I understand correctly, if the lambda is an rvalue, then the captured shared pointer is treated as an rvalue too, so it is moved when the lambda is forwarded, which causes the reference count to go to zero and the object to be destructed. This is avoided by either defining the lambda as an lvalue, or declaring the shared pointer as const
, since in both cases it will not be moved.
In detach_sequence()
, if I change sequence = std::forward<F>(sequence)
to just sequence
in the lambda capture, the output is as expected, and this fixes your bug. Now I'm just trying to remember what the reason was to have forwarding there in the first place. All my tests seem to work just fine if the forwarding is removed, so it seems like the best solution is to simply remove the forwarding.
I will do this in the next release, but it doesn't seem to be urgent since there are two easy fixes on the user side (an lvalue lambda or a const
shared pointer), so I will probably first finish implementing a few additional new features (which I am currently in the middle of working on) and release this bug fix along with the new features. However, if you believe this is urgent, please let me know and I will be happy to release a patch with just this bug fix.
Meanwhile, I am keeping this issue closed since I believe the bug is fixed by removing the forwarding. Thanks again for opening this issue, this was extremely helpful!
Another way to fix it is to declare the
std::shared_ptr
asconst
, which results in the same expected output.
Making it const
indeed seems to make me unable to reproduce the issue. The reason is rather subtle. The std::forward cannot move the std::shared_ptr from the lambda-capture to the new copy of lambda, because the source is const
, and thus cannot be destroyed when moving. So the copy constructor gets selected instead. Interesting workaround!
In
detach_sequence()
, if I changesequence = std::forward<F>(sequence)
to justsequence
in the lambda capture, the output is as expected, and this fixes your bug.
Indeed, but this might have an impact on performance, as now the captured object will be copied as many times as iterations on the sequence. That's why I was talking about the "first approach" where you'd share the lambda between threads, instead of copying it. An easy implementation would be to std::move the lambda over to a std::shared_ptr<std::function<void>()>
and wrap it again into a std::function<void()>
which is the task on the pool. A lot of boiler plate, and I'm not really a fan of the fact that now a std::shared_ptr
with atomic operations is in between to do an extra refcount, but maybe that's just fine...
Thanks for the comments! In my experience, shared pointers can have an impact on performance; in fact, for the next release I'm trying to figure out how to get rid of the shared pointer in submit_task()
. So I'm not sure about that solution. Give me a few weeks to run some tests.
Describe the bug
Forwarding the task function at L513: https://github.com/bshoshany/thread-pool/blob/097aa718f25d44315cadb80b407144ad455ee4f9/include/BS_thread_pool.hpp#L505-L519
and at L536: https://github.com/bshoshany/thread-pool/blob/097aa718f25d44315cadb80b407144ad455ee4f9/include/BS_thread_pool.hpp#L531-L540
Causes the captured objects to be forwarded (moved) out and are therefore unusable in later runs of the task.
Minimal non-working example
Outputs: