NetworkBlockDevice / nbd

Network Block Device
GNU General Public License v2.0
450 stars 116 forks source link

Issue with sending SIGHUP to reload nbd server config #102

Closed taylorty closed 3 years ago

taylorty commented 5 years ago

I am using NBD 3.19 version I tried to reload the nbd server config file by sending SIGHUP to the process. The client is able to setup the connection fine with the server

sudo /usr/local/sbin/nbd-client xxxx -N test1 /dev/nbd1 -b 4096 -persist Negotiation: ..size = 10240MB bs=4096, sz=10737418240 bytes

But then when I tried to drive IO using FIO on that device, it complained that it is a 0 size device. /dev/nbd1: zero sized block device?

The nbd client syslog says

kernel: block nbd1: Connection timed out kernel: block nbd1: shutting down sockets kernel: print_req_error: 5 callbacks suppressed kernel: print_req_error: I/O error, dev nbd1, sector 0

The nbd server syslog says

Connection dropped: Connection reset by peer

Is this a known issue? Is there a workaround for reloading the config file without killing the existing connection?

I appreciate any help in advance.

yoe commented 5 years ago

Those log messages actually suggest that it's a middle box which kills the connection: the client says "Connection timed out", the server says "Connection reset by peer".

The current implementation of SIGHUP handling should only add new configurations to the running server, without touching the existing ones. I haven't double-checked, but the logs you show don't suggest the SIGHUP is at fault here.

Can you confirm that it's impossible for you to reproduce this without SIGHUP, and that the disconnect happens immediately after the SIGHUP?

Thanks,

taylorty commented 5 years ago

Thank you so much for the response. I tried killing the nbd-server process entirely (without using SIGHUP), then reloading the config and it worked fine (I was able to issue IO to the newly added nbd-client device).

Let me clarify the issue more. The existing configurations still work as expected. And the new configurations are able to be added according to the logs but when I issued IO to the new connected nbd-client device, the IO would fail and I saw 'Connection timed out' on the nbd-client side and 'Connection dropped: Connection reset by peer' on the server side. So it seems that although by sending SIGHUP the new configuration on the nbd server is able to be detected and setup by the client but when IO is issued, the connection would drop.

Please let me know if you need more information

Thanks again!

yoe commented 5 years ago

Ah! That's a different story; I thought you were connected to the old server, and that that failed to work after adding a new configuration and sending SIGHUP.

I'll have a look at this when next I have some time, thanks.

Vladyyy commented 5 years ago

@yoe I hit the same issue and can confirm it was fixed by this: https://github.com/NetworkBlockDevice/nbd/issues/96

The tar balls on sourceforge are rather old (2019-01-30). It seems there were a few fixes since then. Any plans to publish the newer releases on Sourceforge ?

yoe commented 5 years ago

Thanks. Yes, that might be the case.

I was going to say that the tarballs were not that old, but you're right; it's been longer than I thought. I'll look at releasing 3.20 some time soon.

kumasam4 commented 3 years ago

Hi, I have NBD Server 3.20 installed on RHEL7 Operating system and configured. While adding new exports under NBD server config, i need to reload new configuration for the exports. While sending SIGHUP signal via OS process to the root PID, it makes NBD-SERVER process dead. Later, i tried with a few more test attempts with the same SIGHUP signal after NBD-SERVER service restart & re-adding exports under config. Sometimes, it works for a few iterations but later it fails. Currently, i have only 5 exported devices in nbd server config, planned to scale up with multiple exports as per needs.

Appreciate for any guidance or suggestion..

yoe commented 3 years ago

@kumasam4,

Hi, I have NBD Server 3.20

That means it's not this issue, but another one, as this issue was closed in 3.20.

Please open a new bug report, and make sure it contains any error messages that nbd-server might output upon receiving SIGHUP.

Meanwhile, closing this issue as it should be fixed (even if a similar one may still exist)