ddavness / power-mailinabox

A Mail-in-a-Box with extra capabilities and more customizability. Not just for power users!
Creative Commons Zero v1.0 Universal
172 stars 32 forks source link

Install failing on Buster and Ubuntu - related to nsd? #22

Closed gregwbrooks closed 3 years ago

gregwbrooks commented 3 years ago

power-mailinabox stopped working (seemed to be a DNS issue) after a recent update of system software. Reinstallation on both Debian Buster and Ubuntu 20.04 fail. Text of the latter's error is below. After throwing this error, anything requiring online access -- wget, apt update, etc. -- all fail, even following reboot.

My guess: Something changed with a bind/nsd update.


FAILED: apt-get -y -o Dpkg::Options::=--force-confdef -o Dpkg::Options::=--force-confnew install ldnsutils openssh-client

Reading package lists... Building dependency tree... Reading state information... openssh-client is already the newest version (1:8.2p1-4ubuntu0.2). The following NEW packages will be installed: ldnsutils libldns2 0 upgraded, 2 newly installed, 0 to remove and 5 not upgraded. Need to get 278 kB of archives. After this operation, 1,142 kB of additional disk space will be used. Err:1 http://mirror.enzu.com/ubuntu focal/universe amd64 libldns2 amd64 1.7.0-4.1ubuntu1 Temporary failure resolving 'mirror.enzu.com' Err:2 http://mirror.enzu.com/ubuntu focal/universe amd64 ldnsutils amd64 1.7.0-4.1ubuntu1 Temporary failure resolving 'mirror.enzu.com' E: Failed to fetch http://mirror.enzu.com/ubuntu/pool/universe/l/ldns/libldns2_1.7.0-4.1ubuntu1_amd64.deb Temporary failure resolving 'mirror.enzu.com' E: Failed to fetch http://mirror.enzu.com/ubuntu/pool/universe/l/ldns/ldnsutils_1.7.0-4.1ubuntu1_amd64.deb Temporary failure resolving 'mirror.enzu.com' E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?

ddavness commented 3 years ago

Hi! Looks like an issue with bind9. What do you see when running these commands?

sudo service bind9 status
sudo journalctl -xe | grep bind9

Also, what's the content of /etc/default/bind9 (relevant for Debian) and /etc/default/named? (relevant for Ubuntu)

gregwbrooks commented 3 years ago

sudo service bind9 status (Results anonymized to EXAMPLE.COM) sudo: unable to resolve host mailhub.newwtg.com: Temporary failure in name resolution ● named.service - BIND Domain Name Server Loaded: loaded (/lib/systemd/system/named.service; enabled; vendor preset: enabled) Active: active (running) since Sat 2021-08-07 15:51:50 PDT; 1h 23min ago Docs: man:named(8) Main PID: 721 (named) Tasks: 14 (limit: 2278) Memory: 42.2M CGroup: /system.slice/named.service └─721 /usr/sbin/named -f -u bind -4

Aug 07 17:15:02 mailhub.newwtg.com named[721]: validating 2.ubuntu.pool.ntp.org/A: bad cache hit (o> Aug 07 17:15:02 mailhub.newwtg.com named[721]: broken trust chain resolving '2.ubuntu.pool.ntp.org/> Aug 07 17:15:02 mailhub.newwtg.com named[721]: connection refused resolving '.org.DOMAIN.com/A/IN'> Aug 07 17:15:02 mailhub.newwtg.com named[721]: connection refused resolving '2.ubuntu.pool.ntp.org.> Aug 07 17:15:02 mailhub.newwtg.com named[721]: connection refused resolving '2.ubuntu.pool.ntp.org.> Aug 07 17:15:07 mailhub.newwtg.com named[721]: connection refused resolving 'mailhub.EXAMPLE.com/A/I> Aug 07 17:15:07 mailhub.newwtg.com named[721]: connection refused resolving 'mailhub.EXAMPLE.com/AAA> Aug 07 17:15:07 mailhub.newwtg.com named[721]: connection refused resolving '.com.EXAMPLE.com/A/IN'> Aug 07 17:15:07 mailhub.newwtg.com named[721]: connection refused resolving 'mailhub.EXAMPLE.com.new> Aug 07 17:15:07 mailhub.newwtg.com named[721]: connection refused resolving 'mailhub.EXAMPLE.com.new

sudo journalctl -xe | grep bind9 sudo: unable to resolve host mailhub.EXAMPLE.com: Temporary failure in name resolution Aug 07 17:14:07 mailhub.EXAMPLE.com sudo[3578]: greg : TTY=pts/0 ; PWD=/home/greg ; USER=root ; COMMAND=/usr/sbin/service bind9 status Aug 07 17:15:07 mailhub.EXAMPLE.com sudo[3616]: greg : TTY=pts/0 ; PWD=/home/greg ; USER=root ; COMMAND=/usr/sbin/service bind9 status

Contents of /etc/default/named #

run resolvconf?

RESOLVCONF=no

startup options for the server

OPTIONS="-u bind"

OPTIONS="-u bind -4"

ddavness commented 3 years ago

Alright, it is nsd then. bind9 looks fine, but nsd failing shouldn't have caused this :thinking: I messed with the nsd configuration in the last updates so it's possible it messed up your NSD install.

Could you please give me the output of the following:

sudo service nsd status
journalctl -xe | grep nsd
ip a

and the contents of /etc/nsd/nsd.conf?

ip a and nsd.conf may expose public ip addresses - feel free to redact those, but make sure that if you're redacting, for example, 1.1.1.1, that you're replacing that with something unique (like Public IP 1 or something like that)

gregwbrooks commented 3 years ago

sudo service nsd status Unit nsd.service could not be found.

journalctl -xe | grep nsd Aug 07 17:46:59 mailhub.EXAMPLE.com sudo[4311]: greg : TTY=pts/0 ; PWD=/home/greg ; USER=root ; COMMAND=/usr/sbin/service nsd status Aug 07 17:47:14 mailhub.EXAMPLE.com sudo[4319]: greg : TTY=pts/0 ; PWD=/home/greg ; USER=root ; COMMAND=/usr/sbin/service nsd start Aug 07 17:47:44 mailhub.EXAMPLE.com sudo[4339]: greg : TTY=pts/0 ; PWD=/home/greg ; USER=root ; COMMAND=/usr/bin/apt install nsd Aug 07 17:48:29 mailhub.EXAMPLE.com sudo[4392]: greg : TTY=pts/0 ; PWD=/home/greg ; USER=root ; COMMAND=/usr/sbin/service nsd status

ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 8a:ce:50:84:76:64 brd ff:ff:ff:ff:ff:ff inet (SUBNET MASK) brd (BROADCAST IP) scope global ens18 valid_lft forever preferred_lft forever inet6 2607:ff28:c005:2b:88ce:50ff:fe84:7664/64 scope global dynamic mngtmpaddr noprefixroute valid_lft 2591493sec preferred_lft 604293sec inet6 fe80::88ce:50ff:fe84:7664/64 scope link valid_lft forever preferred_lft forever

and the contents of /etc/nsd/nsd.conf There is no nsd directory nor nsd.conf -- I think we're onto something. :)

ddavness commented 3 years ago

Yeah, for sure - is this a brand new install by any chance? (What's the contents of /etc/resolv.conf?)

gregwbrooks commented 3 years ago

Yep -- the cascade:

Makes me wonder if an update to NSD is what's broken.

ddavness commented 3 years ago

I don't think so - else I would have easily noticed by now :\

ddavness commented 3 years ago

I'm not sure how is your network laid up but ideally the contents of /etc/resolv.conf should be:

nameserver 127.0.0.1

which is essentially bind9.

In case it doesn't work, as a temporary workaround, you can choose one public dns of your liking - for example 1.1.1.1 from cloudflare or 8.8.8.8 from google:

# /etc/resolv.conf - sets resolver to cloudlfare public nameservers
nameserver 1.1.1.1
gregwbrooks commented 3 years ago

No luck -- setting resolv.conf to 1.1.1.1 and running the install script gets me to the same break I initially reported.

If this seems more like a singular case of user error, let me know -- I don't want to waste your time having you play tech support for a one-off case.

ddavness commented 3 years ago

ACK. Keep me posted in case you find anything interesting

dephillipsmi commented 3 years ago

I had the same problem. I am trying to install on a virtual machine running on a home network behind a nat. I took a look at the nds.conf file. It listed the local ip address for my machine as well as the public ip address. I commented out the public ip address then ran the setup/start.sh script again. The install then successfully continued. It is at the point of installing SpamAssassin. So hopefully it will continue without any problems.

ddavness commented 3 years ago

We have continued this issue privately - turns out the fault was on both nsd and bind.

For nsd, I have a fix ready that will be incorporated in the next version. For bind, it was DNSSEC-related and required some specific configuration changes to their machine and some service restarts.