NetworkBlockDevice / nbd

Network Block Device
GNU General Public License v2.0
459 stars 119 forks source link

Ubuntu 17.10 can't seem to get nbd working, getting segfault #77

Closed ptouchman closed 6 years ago

ptouchman commented 6 years ago

I wanted to try nbd to add some swap space to a older machine that only has 512MB.

I have an ubuntu 17.10 machine to which I've installed nbd-client and nbd-server with "sudo apt install nbd-client nbd-server"

I've made a 4GB file in a ramdisk with tmpfs:

sudo mkdir /mnt/a
sudo mount -t tmpfs -o size=4096M tmpfs /mnt/a
dd if=/dev/zero of=/mnt/a/NBDFILE count=$((1024*1024*1024*4/512)) status=progress
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 10.2248 s, 420 MB/s

and then I launch a nbd-server

nbd-server -C /dev/null 9000 /mnt/a/NBDFILE

and I'll try to connect with it using 127.0.0.1

sudo  nbd-client 127.0.0.1 9000 /dev/nbd0
Warning: the oldstyle protocol is no longer supported.
This method now uses the newstyle protocol with a default export
Negotiation: ..Error: Read failed: End of file
Exiting.

and when I look in my dmesg log, there is a segfault:

[451892.201945] nbd-server[18901]: segfault at 0 ip 00007feb584e1cbe sp 00007ffdf2e6d818 error 4 in libc-2.26.so[7feb58441000+1d6000]

I also tried two systems running Debian 9.3 32bit and got the same Read failed: End of file error.

Am I doing something wrong?

I tried it with sudo nbd-server and it didn't make a difference, still a segfault:

[452796.954020] nbd-server[19156]: segfault at 0 ip 00007f649bf6fcbe sp 00007ffe2e3652a8 error 4 in libc-2.26.so[7f649becf000+1d6000] [452913.048412] nbd-server[19177]: segfault at 0 ip 00007f4ea6827cbe sp 00007ffd0d86d888 error 4 in libc-2.26.so[7f4ea6787000+1d6000]

===============================================

sudo apt install nbd-client Reading package lists... Done Building dependency tree
Reading state information... Done The following NEW packages will be installed: nbd-client 0 upgraded, 1 newly installed, 0 to remove and 81 not upgraded. Need to get 34.3 kB of archives. After this operation, 128 kB of additional disk space will be used. Get:1 http://us.archive.ubuntu.com/ubuntu artful/universe amd64 nbd-client amd64 1:3.15.2-3 [34.3 kB] Fetched 34.3 kB in 0s (103 kB/s)
Preconfiguring packages ... Selecting previously unselected package nbd-client. (Reading database ... 402621 files and directories currently installed.) Preparing to unpack .../nbd-client_1%3a3.15.2-3_amd64.deb ... Unpacking nbd-client (1:3.15.2-3) ... Processing triggers for ureadahead (0.100.0-20) ... Processing triggers for systemd (234-2ubuntu12.1) ... Processing triggers for man-db (2.7.6.1-2) ... Setting up nbd-client (1:3.15.2-3) ... update-initramfs: deferring update (trigger activated) update-rc.d: warning: start and stop actions are no longer supported; falling back to defaults Processing triggers for ureadahead (0.100.0-20) ... Processing triggers for systemd (234-2ubuntu12.1) ... Processing triggers for initramfs-tools (0.125ubuntu12) ... update-initramfs: Generating /boot/initrd.img-4.13.0-38-generic

sudo apt install nbd-server Reading package lists... Done Building dependency tree
Reading state information... Done The following NEW packages will be installed: nbd-server 0 upgraded, 1 newly installed, 0 to remove and 81 not upgraded. Need to get 52.5 kB of archives. After this operation, 164 kB of additional disk space will be used. Get:1 http://us.archive.ubuntu.com/ubuntu artful/main amd64 nbd-server amd64 1:3.15.2-3 [52.5 kB] Fetched 52.5 kB in 0s (125 kB/s)
Preconfiguring packages ... Selecting previously unselected package nbd-server. (Reading database ... 402607 files and directories currently installed.) Preparing to unpack .../nbd-server_1%3a3.15.2-3_amd64.deb ... Unpacking nbd-server (1:3.15.2-3) ... Processing triggers for ureadahead (0.100.0-20) ... Setting up nbd-server (1:3.15.2-3) ...

Creating config file /etc/nbd-server/config with new version Adding system user nbd' (UID 127) ... Adding new groupnbd' (GID 136) ... Adding new user nbd' (UID 127) with groupnbd' ... Not creating home directory `/etc/nbd-server'. Processing triggers for systemd (234-2ubuntu12.1) ... Processing triggers for man-db (2.7.6.1-2) ... Processing triggers for ureadahead (0.100.0-20) ...

lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 17.10 Release: 17.10 Codename: artful

ptouchman commented 6 years ago

Ok, I was finally able to get it working after MUCH much fiddling on ubuntu 17.10.

Apparently, for some reason nbd-server will only work with running as root. It also seems to ignore the port number specified in /etc/nbd-server, and responds only on port 10809.


[generic]
# If you want to run everything as root rather than the nbd user, you
# may either say "root" in the two following lines, or remove them
# altogether. Do not remove the [generic] section, however.
#       user = nbd
#       group = nbd
        includedir = /etc/nbd-server/conf.d

# What follows are export definitions. You may create as much of them as
# you want, but the section header has to be unique.
[nbdfile]
  exportname = /mnt/a/NBDFILE
  port = 9000

# why doesnt this show up on port 9000

and the first signs of life, and you can see that it's listening on port 10809 and we have to specify the -N nbdfile on the nbd-client command line.

Running netstat -tulpen will show the nbd-server listening on port 10809.


$ nbd-server
$ netstat -tulpen
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       User       Inode      PID/Program name    
tcp6       0      0 :::10809                :::*                    LISTEN      1000       8422052    28808/nbd-server    

$ sudo nbd-client 127.0.0.1 10809 /dev/nbd0 -N nbdfile
Negotiation: ..size = 4096MB
bs=1024, sz=4294967296 bytes

I had been trying to use the user and group "nbd" and it didn't make any difference who owned my NBDFILE. It didn't work by changing the owner and group of the file to nbd. Only the switch to root made a difference.

$ ls -l /mnt/a
total 1368
-rw-r--r-- 1 nbd nbd 4294967296 Apr 13 07:39 NBDFILE
$ rm /mnt/a/NBDFILE
rm: remove write-protected regular file '/mnt/a/NBDFILE'? y
rm: cannot remove '/mnt/a/NBDFILE': Operation not permitted
$ sudo rm /mnt/a/NBDFILE
$ dd if=/dev/zero of=/mnt/a/NBDFILE count=$((1024*1024*1024*4/512)) status=progress
4230789120 bytes (4.2 GB, 3.9 GiB) copied, 28 s, 151 MB/s     
8388608+0 records in
8388608+0 records out
4294967296 bytes (4.3 GB, 4.0 GiB) copied, 28.8812 s, 149 MB/s

$ ls -l /mnt/a
total 4194304
-rw-r--r-- 1 me me 4294967296 Apr 14 04:27 NBDFILE

and once I've got it it going, I was able to access it over the network,


$ sudo nbd-client 192.168.0.1 10809 /dev/nbd0  -N nbdfile -s
$ sudo mkswap /dev/nbd0
$ sudo swapon /dev/nbd0 -p 1024
yoe commented 6 years ago

The port statement in the per-export sections is only allowed for backwards compatibility, and is no longer being used (I should probably drop that there). The oldstyle negotiation protocol is no longer supported. If you really want to change the port number (which I don't recommend, but it's certainly possible), then you need to set the port in the [generic] section.

The failure to open the files might be related to the nbd user not having access to parent directories. Can you show what the permissions of /mnt and /mnt/a are? If those are too restrictive, then things will obviously fail.

ptouchman commented 6 years ago

So I can't do things like:

EXAMPLES
       Some examples of nbd-server usage:

       To export a file /export/nbd/exp-bl-dev on port 2000:

         nbd-server 2000 /export/nbd/exp-bl-dev

===============================

With regard to the permissions:

    ls -l /mnt
    total 4
    drwxrwxrwt 2 root root   60 Apr 19 09:11 a
    $ ls -l /mnt/a
    total 3670016
    -rw-r--r-- 1 me me 3758096384 Apr 19 13:06 NBDFILESWAP

===================================

It would be cool if there were more text messages output from the nbd-server, like telling what the file name it's trying to open is. "Error: Read failed: End of file" seems kind of cryptic.


$ sudo nbd-client 127.0.0.1 9000 /dev/nbd0 
Warning: the oldstyle protocol is no longer supported.
This method now uses the newstyle protocol with a default export
Negotiation: ..Error: Read failed: End of file
Exiting.

What's really weird is when I try to use NBD with an ext2 file system, I am getting strange results.

cat mynbdconfig

[generic]
# If you want to run everything as root rather than the nbd user, you
# may either say "root" in the two following lines, or remove them
# altogether. Do not remove the [generic] section, however.
#   user = nbd
#   group = nbd
    includedir = /etc/nbd-server/conf.d
allowlist = 1

# What follows are export definitions. You may create as much of them as
# you want, but the section header has to be unique.
[nbdfileswap]
  exportname = /mnt/a/NBDFILESWAP

[nbdfileext2]
  exportname = /mnt/a/NBDFILEEXT2

===============

and I make the files, 4GB for swap, 4GB for an ext2 filesystem.

$ dd if=/dev/zero of=/mnt/a/NBDFILESWAP count=$((1024**3*4/512)) status=progress
$ dd if=/dev/zero of=/mnt/a/NBDFILEEXT2 count=$((1024**3*4/512)) status=progress

I run my nbdserver with

$ nbd-server -C mynbdconfig

and on my other system:


$ sudo nbd-client 192.168.0.1 10809 /dev/nbd0 -N nbdfileswap -s
$ sudo nbd-client 192.168.0.1 10809 /dev/nbd1 -N nbdfileext2

Get the swap file setup

$ sudo mkswap /dev/nbd0
$ sudo swapon  /dev/nbd0 -p 1024

Get the ext2 file system setup

$ sudo mke2fs /dev/nbd1
$ sudo mount /dev/nbd1 /mnt/c -o loop

Mount the ext2 file system on /mnt/c

$ sudo mkdir /mnt/c/mydir
$ sudo chown me /mnt/c/mydir
$ sudo chgrp me /mnt/c/mydir

I copy a large directory of files to be compiled to my fresh ext2, in this case I was testing compiling mame194.

$ sudo cp -r ~/Downloads/mame194 /mnt/c/mydir

$ cd /mnt/c/mydir
$ make clean; make

and what is really strange is that the md5sum of the compiler output is DIFFERENT from compiling it locally.

I've done it three times now and it just doesn't match. I'm totally baffled.

After immediately copying the directory, I run a

diff -rq /mnt/c/mydir/mame194 ~/Downloads/mame194

and it doesn't show any differences...

I dropped using the nbd swap, using the local hard drive instead, but still using the ext2 over nbd and the compiled executable still comes out different.

I've even tried using ext4 over nbd and it still doesn't compile exactly.

yoe commented 6 years ago

On Thu, Apr 19, 2018 at 02:10:20PM -0700, ptouchman wrote:

So I can't do things like:

EXAMPLES Some examples of nbd-server usage:

   To export a file /export/nbd/exp-bl-dev on port 2000:

     nbd-server 2000 /export/nbd/exp-bl-dev

Not sure at this point :-)

That part of the documentation might be a bit outdated. I believe I fixed it, in that it should now work with newstyle and a default export, but haven't checked that.

At any rate, I may need to fix that documentation.

[...]

and what is really strange is that the md5sum of the compiler output is DIFFERENT from compiling it locally.

This is not necessarily related to NBD.

Have you tried building things in separate directories on a non-NBD-backed filesystem, and checksumming the output of that? I'm quite sure that the result of that won't be the same either.

Compilers encode various things into a built binary, things like paths and timestamps, for example; it is very difficult to compile something in a way that produces a byte-by-byte exactly the same binary. See https://reproducible-builds.org/ for more details on this.

-- Could you people please use IRC like normal people?!?

-- Amaya Rodrigo Sastre, trying to quiet down the buzz in the DebConf 2008 Hacklab

ptouchman commented 6 years ago

Hi Wouter,

Thank you for your replies. I appreciate your help.

Thank you for telling me that the compiler will generate different results based upon the pathname. I thought I was going insane. You have saved my sanity!!!

https://reproducible-builds.org/docs/build-path/ tells all about how important it is to have the build path the same.

Being a bit of a noob at compiling, I did not know that.

I had done this over and over again and getting different md5sums but I did save the ext2 image files to have a look at later.

When I used identical paths, the compiler output was identical. I was a bit careless when I was copying the directory over, and typed different command sequences for the "cp -r " command.

So I had a little lua script that would give me some compile stats so that I could check md5sums, etc which was pretty useful.

I'm going to add a little pwd command to my script so that I will see the compile path.

my "normal compile directory" would be "cd /mnt/c/Downloads/mame194_extract/mame/"


try 1 : directory named just mame194_extract (no Downloads/ at the head and it's 12 bytes less)

===============================================================================
EXECUTING: ls -l ./mame
-rwxr-xr-x 1 golden golden 226877704 Apr 18 12:54 ./mame
================================================================================
EXECUTING: md5sum ./mame
edf1bf33710bd1d52c9720e69e66de7c  ./mame
================================================================================
EXECUTING: gcc --version | head -n 1
gcc (Debian 6.3.0-18+deb9u1) 6.3.0 20170516
================================================================================
EXECUTING: lsb_release -a
Distributor ID: Debian
Description:    Debian GNU/Linux 9.3 (stretch)
Release:        9.3
Codename:       stretch
================================================================================
EXECUTING: uname -a
Linux debian93silver 4.9.0-4-686 #1 SMP Debian 4.9.65-3 (2017-12-03) i686 GNU/Linux
================================================================================
EXECUTING: date
Wed Apr 18 12:55:10 PDT 2018
================================================================================

try 2  path is Downloads/mame194_extract/mame

================================================================================
EXECUTING: ls -l ./mame
-rwxr-xr-x 1 golden golden 226877716 Apr 19 01:36 ./mame
================================================================================
EXECUTING: md5sum ./mame
1525a7a4cc4d2773031ca2627a66581c  ./mame
================================================================================

try 3  path is just Downloads/mame194_extract  (no mame at the end) note that the filesize is 4 characters less

================================================================================
EXECUTING: ls -l ./mame
-rwxr-xr-x 1 golden golden 226877712 Apr 19 08:36 ./mame
================================================================================
EXECUTING: md5sum ./mame
f47ea376cd6fe09af8845e9fdf82e9ee  ./mame
================================================================================

try 4 path is Downloads/mame194_extract/mame  IDENTICAL MD5SUM to try 2!!!

================================================================================
EXECUTING: ls -l ./mame
-rwxr-xr-x 1 golden golden 226877716 Apr 19 12:41 ./mame
================================================================================
EXECUTING: md5sum ./mame
1525a7a4cc4d2773031ca2627a66581c  ./mame
===============================================================================

My little lua script to add some compiling stats and write a log:


-- compiletime.lua
-- This will take the input and print the time in seconds before the line.
-- Once complete it will print info about your system and run mame for a few seconds.
--
-- a nice compile command is something like:
-- make clean; /usr/bin/time make -j2 2>&1 | lua5.3 compiletime.lua | tee COMPILE_LOG.txt | nl

function popenstr(str) local mypipe=io.popen(str,"r") local mystr=mypipe:read("*all") mypipe:close() return mystr end

arch_table={ x86_64 = "mame64", i686 = "mame" }

for archstr,arch_mamestr in pairs(arch_table) do
  if string.find(popenstr("uname -a"),archstr) then 
      mamebinarystr = arch_mamestr 
--    print(archstr,arch_mamestr) 
  end
end

assert(mamebinarystr,"arch not detected")

donestr = "DONE - COMPILE COMPLETE"

function iif(b,t,f) if b then return t  else return f end end 

l = os.time() while true do j = io.read() k = os.time() print (k-l,iif(j==nil,donestr,j)) if j==nil then break end end 

function print_and_execute(str) print(string.rep("=",80).."\n".."EXECUTING: "..str) os.execute(str) end

print_and_execute("cat /proc/meminfo | grep \"MemTotal\"")
print_and_execute("lscpu | grep \"Model name:\"")
print_and_execute("lscpu")
print_and_execute("lspci -nn | grep -E -i \"Display|VGA\"")
print_and_execute("lspci -nn")
print_and_execute("pwd")
print_and_execute("ls -l ./"..mamebinarystr.."")
print_and_execute("md5sum ./"..mamebinarystr.."")
print_and_execute("gcc --version | head -n 1")
print_and_execute("lsb_release -a")
print_and_execute("uname -a")
print_and_execute("date")
print_and_execute("lsmod | grep \"video\\|drm\"")

mamebase = "/usr/bin/time ./"..mamebinarystr.." -seconds_to_run 7 -sdlvideofps -rompath ../../mameroms 2>&1"

mamegame = "pacman"

print_and_execute(mamebase.." "..mamegame.." -video opengl -verbose")
ptouchman commented 6 years ago

and sure enough, running strings on the mame binary shows the compile path

$ strings /mnt/c/mame194_extract/mame/mame | grep "/mnt/c"
/mnt/c/mame194_extract/mame/build/projects/sdl/mame/gmake-linux

and I couldn't understand why a diff would show that the png.pyc was different also, for the exact same reason:

$ strings /mnt/c/mame194_extract/mame/scripts/build/png.pyc | grep "/mnt/c"
/mnt/c/mame194_extract/mame/scripts/build/png.pyt
/mnt/c/mame194_extract/mame/scripts/build/png.pyt
/mnt/c/mame194_extract/mame/scripts/build/png.pyt
/mnt/c/mame194_extract/mame/scripts/build/png.pyR
/mnt/c/mame194_extract/mame/scripts/build/png.pyR
/mnt/c/mame194_extract/mame/scripts/build/png.pyt
...
yoe commented 6 years ago

Okay, since we've now shown that this isn't a bug in nbd-server, closing this issue.

If you do find actual bugs, do not hesitate to reopen this issue though.