Closed alkisg closed 6 years ago
Sorry I need to check my notifications more often, I'll try and reproduce this today.
Sorry I need to check my notifications more often, I'll try and reproduce this today.
So ☺?
Any news?
Is this how you normally handle bug reports?
You can always apply for a full refund!
More constructively, it's worth pointing out this github project is for the userspace NBD components, and a kernel oops is by definition a kernel problem. The code causing the problem is not within the github project. Josef (who is a volunteer like the rest of us) is however the kernel maintainer, but using the kernel mailinglist and reporting as per https://www.kernel.org/doc/html/v4.10/admin-guide/reporting-bugs.html is a more efficient way to reach the right people.
More constructively, it's worth pointing out this github project is for the userspace NBD components, and a kernel oops is by definition a kernel problem? The code causing the problem is not within the github project. Josef (who is a volunteer like the rest of us) is however the kernel maintainer, but using the kernel mailinglist and reporting as per https://www.kernel.org/doc/html/v4.10/admin-guide/reporting-bugs.html is a more efficient way to reach the right people.
Thank you!
I am in no way saying you have to fix the problem immediately. The reason for my prodding is that the initial reaction conveyed the message that the problem was reported at the correct place, and that it would be looked into shortly. That means that other efforts, like what you hinted at now, are not made to not duplicate work.
Thanks for your clarification!
Sorry I fixed these problems and forgot to report back. The panic and such shouldn't happen anymore, and I redid my torture test to verify that it was ok. Let me know if you can still reproduce with a modern kernel.
Since Josef suggests this should have been fixed, I'm going to close this for now. If it does occur again, feel free to reopen.
Hi, the following code exposes some race condition in disconnections:
After the code runs (sometimes 2-3 runs are needed), some of the nbd-client instances are still running in some hanged state, preventing nbd-client [re/dis]connections, blocking system shutdown etc.
This affects us in LTSP where we have a "connect, check if there's a newer version of the image, disconnect" logic, and it sometimes causes issues due to the aforementioned race condition.
Some of the errors displayed in dmesg: