debian-pi / raspbian-ua-netinst

Raspbian (minimal) unattended netinstaller
Other
1.17k stars 153 forks source link

Malformed resolv.conf breaks networking (multiple dns servers) #534

Open ajones11235 opened 3 years ago

ajones11235 commented 3 years ago

First - thank you for the great utility. I have found it immensely useful. Second - Apologies, I ran into this bug ages ago when building a load of raspberry pi 1bs with version 1.1.2 and found a workaround without really understanding the problem and didn't bother to report it. I am now re-building them with version 1.1.7 and this time I can't usefully use the workaround.

If there are multiple DNS servers specified, either hard coded or by dhcp, all the entries for nameserver appear on the same line in /etc/resolv.conf.
Last time I was using hard coded 1pv4 addressing so reducing the number of dns servers in installer-config.txt to one made it work. This time I am using hard coded ipv4 and dhcp ipv6 addressing, which results in having the ipv4 nameserver address plus some ipv6 addresses from dhcp.

On starting up the system the messages saying it couldn't connect to a time server appeared and rapidly scrolled off the screen

Later it reached and failed at this point:

**P: Retrieving Release E: Couldn't download Release!

Oh noes! something went wrong!

Output of '/busybox free -k' . . (too much typing) . You have 10 seconds to hit ENTER to get a shell...**

Hitting enter a couple of times opened BusyBox v1.30.1 (Raspbian 1:1.30.1-4 built in shell (csh)

From there pinging any device by name failed

However pinging by ip address (ipv4 or ipv6) worked.

While in busybox I edited resolv.conf to add some newlines and ping by name worked just fine (both ipv4 and ipv6) Also while in busybox I eventually discovered the file /etc/init.d/rcS

In that file there are some lines of the form printf '%s' 'some value' >> resolv.conf I think they should be printf '%s\n' 'some value' >> resolv.conf

(Sorry if this is a bit vague. It's all from memory as it all disappeared when I shut down the box) I added some \ns where it seemed appropriate but was unable to restart the network so I don't know if that solved the problem.

I couldn't work out where in your distribution that file is stored so I have reached a standstill.

If you could tell me how to patch that file I would be very happy to test it for you. Thanks again for taking the trouble to make and maintain such a useful utility

diederikdehaas commented 3 years ago

You are entirely correct, that is a bug. :+1:

You can either have:

nameserver dns-server-1 dns-server-2 dns-server-3

or

nameserver dns-server-1
nameserver dns-server-2
nameserver dns-server-3

but this code generates

nameserver dns-server-1 nameserver dns-server-2 nameserver dns-server-3

which is incorrect.
I'll fix that shortly

diederikdehaas commented 3 years ago

I've now pushed 2 commits to the v1.1.x branch which should fix this issue. If you can build a new installer with those and verify it is indeed now fixed (and report back), that would be great.

Please do (also) test a mix of f.e. hardcoded ipv4 and dhcp for ipv6.

ajones11235 commented 3 years ago

Wow, what an amazing response. I wish i could get back to you as quickly but I think I will have a bit of a learning curve (I joined github just a few hours ago...).

I run Fedora here. Should I assume it has to be built in a debian VM?

If you could point me to any useful tutorials / lists of required tools to help me through the process I would be grateful

diederikdehaas commented 3 years ago

I am now re-building them with version 1.1.7

From that I assumed you already (re-)build the installer yourself. Sorry.

I can (and probably will) provide a fresh build on my personal clone, but it shouldn't be too hard. Have you used git (itself) before?

There's a good chance it'll run on Fedora as well, but I do use Debian. There's https://github.com/debian-pi/raspbian-ua-netinst/blob/master/BUILD.md which describes what you need and what to do to build an installer. I expect that most programs are already installed on most/all Fedora systems.
One program that is missing from that BUILD.md document is ar and that may not be installed by default on Fedora. But if you do ar --help from a terminal window, you should find out quickly enough. tar isn't listed either, but that should be available on any Linux system.

The procedure is as follows:

git clone https://github.com/debian-pi/raspbian-ua-netinst/
cd raspbian-ua-netinst
git checkout v1.1.x
./clean.sh
./update.sh
./build.sh
sudo ./buildroot.sh
diederikdehaas commented 3 years ago

I published it here: https://github.com/diederikdehaas/raspbian-ua-netinst/releases/tag/v1.1.8-alpha (I am/was too lazy to create checksums and the other things I normally do for a release ;-))

ajones11235 commented 3 years ago

Just to reassure that I am working on it.
My scripted install is still running (takes 2 - 3 hours on my slow internet and downloads a ton of applications and tools) but the fact that it has got this far shows your fix has remedied my immediate problem. I will do some minimal installs with different combinations of dns settings when it's done.

I hope to have test results for the resolv.conf fix by the end of the day.

FWIW at first it didn't build on my Fedora box, failing as shown below:

. . . Extracting libudev1_241-7~deb10u8+rpi1_armhf.deb... Extracting libunistring2_0.9.10-1_armhf.deb... Extracting libuuid1_2.33.1-0.1_armhf.deb... Extracting linux-image-4.9.0-6-rpi2_4.9.82-1+deb9u3+rpi2_armhf.deb... Extracting linux-image-4.9.0-6-rpi_4.9.82-1+deb9u3+rpi2_armhf.deb... Extracting lsb-base_10.2019051400+rpi1_all.deb... Extracting ndisc6_1.0.4-1_armhf.deb... Extracting netbase_5.6_all.deb... Extracting ntpdate_4.2.8p12+dfsg-4_armhf.deb... Extracting raspberrypi-bootloader-nokernel_1.20180328-1~nokernel1_armhf.deb... Extracting raspbian-archive-keyring_20120528.2_all.deb... Extracting rng-tools_2-unofficial-mt.14-1_armhf.deb... Extracting tar_1.30+dfsg-6_armhf.deb... Extracting util-linux_2.33.1-0.1_armhf.deb... Extracting wpasupplicant_2.7+git20190128+0c1e29f-6+deb10u3_armhf.deb... Extracting zlib1g_1.2.11.dfsg-1_armhf.deb... cp: cannot stat 'tmp/etc/rmt': No such file or directory

$ ls -l tmp/etc/rmt lrwxrwxrwx. 1 jonesap jonesap 13 Apr 23 2019 tmp/etc/rmt -> /usr/sbin/rmt

$ ls -l /usr/sbin/rmt ls: cannot access '/usr/sbin/rmt': No such file or directory

I changed line 389 of build.sh to: cp -d tmp/etc/rmt rootfs/etc/ and the build completed with no errors signalled.

diederikdehaas commented 3 years ago

Just to reassure that I am working on it.

There is no rush, take your time. When you respond I get an email notification so I know there's potentially something to do.

My scripted install is still running (takes 2 - 3 hours on my slow internet and downloads a ton of applications and tools)

I run the (build and) install on a regular basis and have installed apt-cacher-ng in my LAN which caches a whole bunch of (deb) packages. It could possibly work on Fedora too; otherwise there's probably some (caching) proxy which could help. I have this in my installer-config.txt:

mirror=http://<acng-server>:3142/mirrordirector.raspbian.org/raspbian/

but the fact that it has got this far shows your fix has remedied my immediate problem.

Nice :-)

cp: cannot stat 'tmp/etc/rmt': No such file or directory $ ls -l tmp/etc/rmt lrwxrwxrwx. 1 jonesap jonesap 13 Apr 23 2019 tmp/etc/rmt -> /usr/sbin/rmt

$ ls -l /usr/sbin/rmt ls: cannot access '/usr/sbin/rmt': No such file or directory

I changed line 389 of build.sh to: cp -d tmp/etc/rmt rootfs/etc/

Interesting :+1: I've never seen this issue on my Debian box, but AFAICT you just spotted a bug in the Debian tar package! Until now, I didn't realize that /etc/rmt is symlink. Normally we handle that by creating links ourselves. But as you noticed, the symlink points to a non-existing file :-O So I'm just going to remove it from our project and file a Debian bug against the tar package.

diederikdehaas commented 3 years ago

I've never seen this issue on my Debian box, but AFAICT you just spotted a bug in the Debian tar package!

Oeps, error on my part. There is no bug in the Debian package, I checked the wrong path. This project (still) doesn't need it, apparently it's used for compatibility with other Unixes.

ajones11235 commented 3 years ago

OK, so this isn't as easy as I hoped.

I have run some tests with minimal builds and while it works with my use case and the (I guess) most common use case there are still some cases where it fails completely. The resuts below are the results of several runs grouped together as ´Tests' where all runs in that test had identical installer-config files. Busybox resolv .conf was found by waiting for today's date to be displayed on the monitor then disconnecting the network cable so it failed back to busybox I'm very tired and going to bed. I hope all this will be useful.

ipv4 ipv6 Result Test
1 fixed dns auto OK but see test Test 1
2 fixed dns auto Fail Test 2
2 fixed dns disable Fail Test 3
dhcp (i dns) disable OK Test 4
dhcp (1 dns) dhcp Fail Test 5

Test 1 (ipv4: i fixed dns ipv6: dhcp)

Works OK but busybox resolv.conf has redundant lines

installer-config.txt

ip4_addr=192.168.0.40
ip4_gateway=192.168.0.1
ip4_nameservers=192.168.0.1
ip4_prefixlength=24
ip6_addr=auto

/etc/resolv.conf (busybox)

nameserver 192.168.0.1
nameserver 1111:2222:3333:4444:5555:6666:7777:8888
nameserver fe80::aaaa:bbbb:cccc:dddd
nameserver 1111:2222:3333:4444:5555:6666:7777:8888
nameserver fe80::aaaa:bbbb:cccc:dddd

/etc/resolve.conf (raspbian)

# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 192.168.0.1
nameserver 1111:2222:3333:4444:5555:6666:7777:8888
nameserver fe80::aaaa:bbbb:cccc:dddd%eth0
search home

Test 2 (ipv4: 2 fixed dns. ipv6: dhcp)

Installation proceeded to completion but network did not come up on reboot (Entire installation took place using ipV6)

installer-config.txt

ip4_addr=192.168.0.40
ip4_gateway=192.168.0.1
ip4_nameservers=192.168.0.1 8.8.4.4
ip4_prefixlength=24
ip6_addr=auto

/etc/resolv.conf (busybox)

nameserver 1111:2222:3333:4444:5555:6666:7777:8888
nameserver fe80::aaaa:bbbb:cccc:dddd
nameserver 1111:2222:3333:4444:5555:6666:7777:8888
nameserver fe80::aaaa:bbbb:cccc:dddd

/etc/resolve.conf (raspbian)

# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN

/var/log/syslog

.
.
.
Sep  1 15:41:13 basilisk systemd[1]: Started udev Coldplug all Devices.
Sep  1 15:41:13 basilisk systemd[1]: Starting Helper to synchronize boot up for ifupdown...
Sep  1 15:41:13 basilisk sh[178]: ifquery: /etc/network/interfaces:12: option with empty value
Sep  1 15:41:13 basilisk sh[178]: ifquery: couldn't read interfaces file "/etc/network/interfaces"
Sep  1 15:41:13 basilisk systemd[1]: Started Helper to synchronize boot up for ifupdown.
Sep  1 15:41:13 basilisk systemd[1]: Started Create System Users.
.
.
.

/etc/network/interfaces

# interfaces(5) file used by ifup(8) and ifdown(8)
# Include files from /etc/network/interfaces.d:
source-directory /etc/network/interfaces.d
auto lo
iface lo inet loopback

auto eth0
allow-hotplug eth0
iface eth0 inet static
    address 192.168.0.40/24
    gateway 192.168.0.1
    dns-nameservers
iface eth0 inet6 auto

Test 3 (ipv4: 2 fixed dns. ipv6: disable)

Failed

installer-config.txt

ip4_addr=192.168.0.40
ip4_gateway=192.168.0.1
ip4_nameservers=192.168.0.1 8.8.4.4
ip4_prefixlength=24
ip6_addr=disable

/etc/resolv.conf (busybox)

zero bytes

Test 4 (ipv4: dhcp. ipv6: disable)

Worked OK

installer-config.txt

ip4_addr=dhcp
ip6_addr=disable

/etc/resolv.conf (busybox)

search home
nameserver 192.168.0.1

/etc/resolv.conf (raspbian)

search home
nameserver 192.168.0.1

Test 4 (ipv4: dhcp. ipv6: disable)

Worked OK

installer-config.txt

ip4_addr=dhcp
ip6_addr=disable

/etc/resolv.conf (busybox)

search home
nameserver 192.168.0.1

/etc/resolv.conf (raspbian)

search home
nameserver 192.168.0.1

Test 5 (ipv4: dhcp, ipv6: dhcp)

Failed very quickly

installer-config.txt

ip4_addr=dhcp
ip6_addr=dhcp

No busybox resolv.conf

raspbian-ua-netinst-19700101T000023.log

.
.
=================================================
=== Start executing installer-config.txt. ===
=== Finished executing installer-config.txt. ===
=================================================

Network configuration:
  ip4_addr = dhcp
  ip6_addr = dhcp
  ip6_prefixlength = 0
  ip6_gateway = auto
  ip6_nameservers = auto
  online_config =

Loading drivers.
Finished loading drivers.
Waiting for eth0... OK
Configuring eth0 for IPv4 using DHCP... 192.168.0.137/24
Waiting for IPv6 link-local address on eth0... OK
Configuring eth0 for IPv6 using static dhcp... Error: any valid prefix is expected rather than "dhcp/0".

Oh noes, something went wrong!**
.
.
diederikdehaas commented 3 years ago

I haven't forgotten about this issue. But I've been busy with other things and concluded that I really need some hands on experience with IPv6, which is one of the reasons I'm now learning how to 'hack' my router. So it looks like this issue will be open for quite a while.