flightaware / piaware

Client-side package and programs for forwarding ADS-B data to FlightAware
BSD 2-Clause "Simplified" License
499 stars 70 forks source link

Name resolution failure on startup #4

Closed brookst closed 9 years ago

brookst commented 10 years ago

I'm getting an intermittent failure on startup of piaware (again using systemd under Arch with Tclsh8.6). My log reads the following repeatedly:

Oct 13 12:03:08 server piaware[210]: 10/13/2014 11:03:08 connecting to FlightAware eyes.flightaware.com/1200
Oct 13 12:03:08 server piaware[210]: 10/13/2014 11:03:08 got 'couldn't open socket: Name or service not known' to adept server at eyes.flightaware.com/1200, will try again soon...

I can't see why it can't resolve the address, given that if I stop and restart the service, the socket connects correctly. It's also pretty awkward to test since it only manifests at boot and not reliably so. Any pointers on what might be going wrong, or how to debug further?

Also possibly relevant is that my network interfaces, one of which is unconnected, seem to switch places in /dev so it might be somehow binding to the unconnected interface.

lehenbauer commented 10 years ago

Hi Tim,

If the problem only manifests at boot then I suspect piaware is getting started before some other critical service. What's really peculiar is why a restart of piaware is required for it to start working.

I presume you are not running a DNS server or a DNS cacher.

What's in your /etc/resolv.conf? Also your /etc/hosts?

The required restart of piaware is confusing. I know that neither piaware, the resolver library, or Tcl cache DNS query results. (I have toyed with including the IP address of the FA server along with the hostname just for this sort of problem.)

If you are running a cacher or nameserver then try putting the service name as a start prerequisite in /etc/init.d/piaware and see what happens. In any case we are committed to helping you figure this out so please let us know what you find out.

Karl

brookst commented 10 years ago

Hi Karl, Thanks for the help. I'm not running any DNS services locally. Here's my hosts and resolv.conf:

~> cat /etc/hosts
#
# /etc/hosts: static lookup table for host names
#

#<ip-address>   <hostname.domain.org>   <hostname>
127.0.0.1       localhost.localdomain   localhost
::1             localhost.localdomain   localhost

# End of file

~> cat /etc/resolv.conf
# Generated by resolvconf
nameserver 194.168.4.100
nameserver 194.168.8.100

And this is all the services running on the system:

~> systemctl status
● server
    State: running
     Jobs: 0 queued
   Failed: 0 units
    Since: Mon 2014-10-13 09:30:34 BST; 4h 19min ago
   CGroup: /
           ├─1 /sbin/init
           ├─system.slice
           │ ├─avahi-daemon.service
           │ │ ├─212 avahi-daemon: running [server.local
           │ │ └─257 avahi-daemon: chroot helpe
           │ ├─flightradar24.service
           │ │ └─749 /home/brooks/sdr/fr24feed_x64_242 --fr24key=xxxxxxxxxxxxxxxx
           │ ├─dbus.service
           │ │ └─213 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
           │ ├─dump1090.service
           │ │ └─209 /home/brooks/sdr/dump1090/dump1090 --quiet --net --net-beast --net-ro-port 30005
           │ ├─collectd.service
           │ │ ├─225 /usr/bin/collectdmon
           │ │ └─226 collectd -f
           │ ├─systemd-journald.service
           │ │ └─129 /usr/lib/systemd/systemd-journald
           │ ├─ntpd.service
           │ │ └─234 /usr/bin/ntpd -g -u ntp:ntp
           │ ├─piaware.service
           │ │ ├─4966 /usr/bin/piaware -debug
           │ │ └─5005 /usr/bin/faup1090
           │ ├─ftpd.service
           │ │ └─228 /usr/bin/ftpd -D
           │ ├─systemd-logind.service
           │ │ └─203 /usr/lib/systemd/systemd-logind
           │ ├─system-getty.slice
           │ │ └─getty@tty1.service
           │ │   └─278 /sbin/agetty --noclear tty1 linux
           │ ├─sshd.service
           │ │ └─204 /usr/bin/sshd -D
           │ ├─systemd-udevd.service
           │ │ └─192 /usr/lib/systemd/systemd-udevd
           │ ├─rpcbind.service
           │ │ └─270 /usr/bin/rpcbind -w 
           │ ├─httpd.service
           │ │ ├─ 369 /usr/bin/httpd -k start
           │ │ ├─ 459 /usr/bin/httpd -k start
           │ │ ├─ 460 /usr/bin/httpd -k start
           │ │ ├─2115 /usr/bin/httpd -k start
           │ │ ├─2118 /usr/bin/httpd -k start
           │ │ └─4944 /usr/bin/httpd -k start
           │ ├─cronie.service
           │ │ └─201 /usr/bin/crond -n
           │ ├─syslog-ng.service
           │ │ └─267 /usr/bin/syslog-ng -F
           │ └─system-netctl\x2difplugd.slice
           │   └─netctl-ifplugd@eth0.service
           │     ├─271 /usr/bin/ifplugd -i eth0 -r /etc/ifplugd/netctl.action -bfIns
           │     └─759 dhcpcd -4 -q -t 30 -L eth0
           └─user.slice

I'm now digging through the logs to see if anything related is happening at boot.

Tim

brookst commented 10 years ago

My network configuration was a bit messy, so I've switched to NetworkManager for now. This rebooted ok and, I suspect, properly ordered piaware after the network had come up. I'll keep an eye on this to make sure sure it's not a fluke.

It still seems odd that a process can get stuck not resolving a name though, maybe an IP fall-back with a warning work be helpful.

lehenbauer commented 10 years ago

Like you I find it peculiar that it would continue to fail. I looked a bit further. "They" seem to disavow any DNS caching in Linux without bind or some other service running... as I would expect. I thought your 194 addresses looked local and hence that we might pin blame on some consumer router's DNS service but no, they're legit global IPv4 addresses. I don't know.

I have, though, updated the adept client to round-robin try a list of hosts, currently consisting of two, the hostname and IP, and confirmed that works, commit ab8047c1. It's on the master branch and will go into 1.15.

mutability commented 9 years ago

I wonder if this was resolv.conf not being reread (apparently it's read once on the first query and subsequently not re-read for the lifetime of the process unless you call res_init)