NetworkBlockDevice / nbd

Network Block Device
GNU General Public License v2.0
452 stars 119 forks source link

nbd-server before commit 2ab3a2d fails to communicate with nbd-client after commit e6b56c1 #66

Closed alkisg closed 6 years ago

alkisg commented 6 years ago

Recent nbd-clients fail to connect to a bit older nbd-servers.

Was there a protocol breaking change between those versions, or is this a regression?

yoe commented 6 years ago

Hi Alkis,

On Tue, Jan 09, 2018 at 12:30:43AM -0800, Alkis Georgopoulos wrote:

Recent nbd-clients fail to connect to a bit older nbd-servers.

• nbd-server: 1:3.13-1 in Ubuntu 16.04 • nbd-client: 1:3.16.2-1 in Ubuntu 18.04 • command: nbd-client 10.161.254.11 -N /opt/ltsp/i386 /dev/nbd0 • server syslog messages:

Jan 9 10:25:06 alkis nbd_server[1681]: Spawned a child process
Jan 9 10:25:06 alkis nbd_server[30471]: Negotiation failed/5a: magic
mismatch
Jan 9 10:25:06 alkis nbd_server[30471]: Exiting.
Jan 9 10:25:06 alkis nbd_server[30471]: Modern initial negotiation failed
Jan 9 10:25:06 alkis nbd_server[1681]: Child exited with 1

Whoops.

Was there a protocol breaking change between those versions, or is this a regression?

No, there shouldn't have been. That client does implement NBD_OPT_GO rather than NBD_OPT_EXPORT_NAME, however, but that should not have been incompatible.

I'll try to debug this further.

-- Could you people please use IRC like normal people?!?

-- Amaya Rodrigo Sastre, trying to quiet down the buzz in the DebConf 2008 Hacklab

yoe commented 6 years ago

Hi,

On Tue, Jan 09, 2018 at 09:34:30AM +0000, Wouter Verhelst wrote:

I'll try to debug this further.

I think this was fixed in commit 2ab3a2db94930b25bb685d6102e6d1435ebd73d4, but that was only part of 3.14.

Can you verify?

-- Could you people please use IRC like normal people?!?

-- Amaya Rodrigo Sastre, trying to quiet down the buzz in the DebConf 2008 Hacklab

alkisg commented 6 years ago

I could not find a 3.14 version where I could apply that commit...

I downloaded 1:3.14-4 (Mon, 21 Nov 2016) from https://launchpad.net/ubuntu/zesty/+source/nbd, which didn't include the commit, but I couldn't apply it because it didn't include TLS support at all. I installed the nbd-server_3.14-4_amd64.deb anyway, and verified that it has the same issue that I reported, i.e. magic mismatch with 3.16 nbd-clients.

The next one that I could find, 1:3.15-1 (Tue, 20 Dec 2016) already included the commit. But I couldn't install the nbd-server_3.15-1_amd64.deb package to test it, because:

nbd-server depends on libgnutls30 (>= 3.5.6); however: Version of libgnutls30:amd64 on system is 3.4.10-4ubuntu1.4.

yoe commented 6 years ago

If you compile the source package for nbd-server 3.15 on your older ubuntu though, does that work?

yoe commented 6 years ago

alternatively, build a version of 3.14 with the mentioned patch backported (but then you might have to update things a bit so they work without the TLS abstraction layer that was added for 3.15)

alkisg commented 6 years ago

I downloaded nbd 3.16.2 from https://packages.ubuntu.com/source/bionic/nbd and I ran debuild -b on 16.04. Most build tests were unsuccessful (maybe it needs to be built as root?) so it didn't produce a .deb. Nevertheless, it did produce an nbd-server binary which worked fine.

I.e. nbd-server 3.16.2 on Ubuntu 16.04 communicated fine with nbd-client 3.16.2 on Ubuntu 18.04.

yoe commented 6 years ago

The problem is that when the server encounters a command during negotiation which it doesn't know about, it will (correctly) return an error message of type NBD_REP_ERR_UNSUP, but will then not clean up properly, resulting in things going completely out of sync, which causes the "incorrect magic" message.

More recent versions of nbd-client default to using NBD_OPT_INFO/NBD_OPT_GO to connect to the server, falling back to the older NBD_OPT_EXPORT_NAME if the server indicates (by way of NBD_REP_ERR_UNSUP) that it does not support that message. Combined with the above bug, though, things don't go so well.

I'll see if I can improve the client so that if it sees NBD_REP_ERR_UNSUP before things go south, it will try to reconnect and do the negotiation again without sending the problematic command.

yoe commented 6 years ago

I tried to make nbd-client fall back automatically, but unfortunately that turned out to be rather hard; nbd-client is in dire need of some refactoring.

Meanwhile, I've added a -g option to disable the use of NBD_OPT_GO. This should make things work slightly better (I hope).

alkisg commented 6 years ago

Thank you Wouter, yeah even a manually provided option makes things a lot better. Much appreciated!

gklimm commented 5 years ago

I tried to make nbd-client fall back automatically, but unfortunately that turned out to be rather hard; nbd-client is in dire need of some refactoring.

Meanwhile, I've added a -g option to disable the use of NBD_OPT_GO. This should make things work slightly better (I hope).

Where to put the magic "-g"?