Closed brookst closed 9 years ago
Hi Tim,
If the problem only manifests at boot then I suspect piaware is getting started before some other critical service. What's really peculiar is why a restart of piaware is required for it to start working.
I presume you are not running a DNS server or a DNS cacher.
What's in your /etc/resolv.conf? Also your /etc/hosts?
The required restart of piaware is confusing. I know that neither piaware, the resolver library, or Tcl cache DNS query results. (I have toyed with including the IP address of the FA server along with the hostname just for this sort of problem.)
If you are running a cacher or nameserver then try putting the service name as a start prerequisite in /etc/init.d/piaware and see what happens. In any case we are committed to helping you figure this out so please let us know what you find out.
Karl
Hi Karl, Thanks for the help. I'm not running any DNS services locally. Here's my hosts and resolv.conf:
~> cat /etc/hosts
#
# /etc/hosts: static lookup table for host names
#
#<ip-address> <hostname.domain.org> <hostname>
127.0.0.1 localhost.localdomain localhost
::1 localhost.localdomain localhost
# End of file
~> cat /etc/resolv.conf
# Generated by resolvconf
nameserver 194.168.4.100
nameserver 194.168.8.100
And this is all the services running on the system:
~> systemctl status
● server
State: running
Jobs: 0 queued
Failed: 0 units
Since: Mon 2014-10-13 09:30:34 BST; 4h 19min ago
CGroup: /
├─1 /sbin/init
├─system.slice
│ ├─avahi-daemon.service
│ │ ├─212 avahi-daemon: running [server.local
│ │ └─257 avahi-daemon: chroot helpe
│ ├─flightradar24.service
│ │ └─749 /home/brooks/sdr/fr24feed_x64_242 --fr24key=xxxxxxxxxxxxxxxx
│ ├─dbus.service
│ │ └─213 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
│ ├─dump1090.service
│ │ └─209 /home/brooks/sdr/dump1090/dump1090 --quiet --net --net-beast --net-ro-port 30005
│ ├─collectd.service
│ │ ├─225 /usr/bin/collectdmon
│ │ └─226 collectd -f
│ ├─systemd-journald.service
│ │ └─129 /usr/lib/systemd/systemd-journald
│ ├─ntpd.service
│ │ └─234 /usr/bin/ntpd -g -u ntp:ntp
│ ├─piaware.service
│ │ ├─4966 /usr/bin/piaware -debug
│ │ └─5005 /usr/bin/faup1090
│ ├─ftpd.service
│ │ └─228 /usr/bin/ftpd -D
│ ├─systemd-logind.service
│ │ └─203 /usr/lib/systemd/systemd-logind
│ ├─system-getty.slice
│ │ └─getty@tty1.service
│ │ └─278 /sbin/agetty --noclear tty1 linux
│ ├─sshd.service
│ │ └─204 /usr/bin/sshd -D
│ ├─systemd-udevd.service
│ │ └─192 /usr/lib/systemd/systemd-udevd
│ ├─rpcbind.service
│ │ └─270 /usr/bin/rpcbind -w
│ ├─httpd.service
│ │ ├─ 369 /usr/bin/httpd -k start
│ │ ├─ 459 /usr/bin/httpd -k start
│ │ ├─ 460 /usr/bin/httpd -k start
│ │ ├─2115 /usr/bin/httpd -k start
│ │ ├─2118 /usr/bin/httpd -k start
│ │ └─4944 /usr/bin/httpd -k start
│ ├─cronie.service
│ │ └─201 /usr/bin/crond -n
│ ├─syslog-ng.service
│ │ └─267 /usr/bin/syslog-ng -F
│ └─system-netctl\x2difplugd.slice
│ └─netctl-ifplugd@eth0.service
│ ├─271 /usr/bin/ifplugd -i eth0 -r /etc/ifplugd/netctl.action -bfIns
│ └─759 dhcpcd -4 -q -t 30 -L eth0
└─user.slice
I'm now digging through the logs to see if anything related is happening at boot.
Tim
My network configuration was a bit messy, so I've switched to NetworkManager for now. This rebooted ok and, I suspect, properly ordered piaware after the network had come up. I'll keep an eye on this to make sure sure it's not a fluke.
It still seems odd that a process can get stuck not resolving a name though, maybe an IP fall-back with a warning work be helpful.
Like you I find it peculiar that it would continue to fail. I looked a bit further. "They" seem to disavow any DNS caching in Linux without bind or some other service running... as I would expect. I thought your 194 addresses looked local and hence that we might pin blame on some consumer router's DNS service but no, they're legit global IPv4 addresses. I don't know.
I have, though, updated the adept client to round-robin try a list of hosts, currently consisting of two, the hostname and IP, and confirmed that works, commit ab8047c1. It's on the master branch and will go into 1.15.
I wonder if this was resolv.conf not being reread (apparently it's read once on the first query and subsequently not re-read for the lifetime of the process unless you call res_init)
I'm getting an intermittent failure on startup of piaware (again using systemd under Arch with Tclsh8.6). My log reads the following repeatedly:
I can't see why it can't resolve the address, given that if I stop and restart the service, the socket connects correctly. It's also pretty awkward to test since it only manifests at boot and not reliably so. Any pointers on what might be going wrong, or how to debug further?
Also possibly relevant is that my network interfaces, one of which is unconnected, seem to switch places in
/dev
so it might be somehow binding to the unconnected interface.