Open amcewen opened 5 years ago
+1 for .local, it's fewer characters to type :-)
Consider it done
So I've now taken a look and
I've remoted into the box through the OpenBalena instance it is connected to and it's running the avahi-daemon (m-dns) on the wrong hostname. We can ping this hostname.
I've posted this to the balena forum
Hi,
I’m running multiple containers on a RPIv3 on our network here.
The hostname is set to “mqtt” and when we power up I can ping mqtt.local
After a while (days+) we can no longer access mqtt.local
I’ve remoted in through the OpenBalena instance it is connected to and checked what is running.
I see the avahi-daemon is on a different hostname
754 avahi 5924 S avahi-daemon: running [mqtt-39.local] 770 avahi 5136 S avahi-daemon: chroot helper 4892 redsocks 4008 S avahi-daemon: registering [b356345-80720.local] 4893 redsocks 3472 S avahi-daemon: chroot helper Sure enough if I ping mqtt-39.local I get a response.
Can you point me in the right direction to understand why the daemon is configured with the extra -39 instead of the hostname?
Thanks!
Alex
https://forums.balena.io/t/mdns-local-access-to-device-failing-after-a-bit/27784
Did you make that change @skos-ninja ?
More specifically do we have a DNS configured .local domain?
I am reading now that the m-DNS support gets upset if we do !
(And the mqtt.local box is going bonkers as it thinks there are hostname conflicts everywhere)
NB. I know I asked you to do this! I think I made a mistake...
Thread of conversation on Avahi behaviour here
https://forums.balena.io/t/mdns-local-access-to-device-failing-after-a-bit/27784/17
I've found that the mDNS implementation for the espurna sonoff plugs is complete, since thing works:
ping ESPURNA-547CD9.local
It's a shame that mDNS works for the IoT ESP8266-based endpoint devices, but not for the main rPI broker.
Having been asked to debug this I had a poke around on the router and found a setting:
Register client hostname from DHCP requests in USG DNS forwarder: ON/OFF Which I found in Settings -> Services -> DHCP -> DHCP Server
That appears to be taking the hostname that was passed in the DHCP requests and returning it in DNS requests, and doing this for a long time after that device has disappeared. I've turned this off and Sams-iPhone.local
now seems to have stopped working (which is correct) at least if I clear my cache.
Also:
WiFi b8:27:eb:cb:96:8c - was configured to be 10.0.100.1 in the DHCP - now configured to 10.0.100.2 Wired b8:27:eb:9e:c3:d9 - wasn't configured, now configured to be 10.0.100.1
And.. on further discussion I've removed those IP addresses from the DHCP configuration, but we can say that those two IP addresses are allocated to this purpose, so should be manually assigned on the box itself (doesn't matter to me if you don't use both of them but I'll record them as being for this purpose on the network documentation).
Things that previously used .localdomain
are essentially not able to be pinged by their hostname on the network it seems, as I can no longer reach them. This includes Alex's Octoprint instance. They used to coexist, which is quite strange in and of itself and shouldn't be possible, but now they do not. Devices that used localdomain are now unresponsive on anything but their ip.
This device was previously accessible at octopi.localdomain
but is now only accessible at its IP at 10.0.39.51
# Generated by resolvconf
domain local
nameserver 10.0.0.1
nameserver 1.1.1.1
nameserver 1.0.0.1
My resolv.conf now shows domain local
rather than domain localdomain
which is default on a lot of Linux/FreeBSD systems. It may be true that however @ajlennon has his Pi setup with Balena or otherwise is permanently configured to use localdomain
which is something that the network is no longer respecting.
I have no idea how the router could have anything to do with this other than DROPPING the packets that are related to .localdomain
. I've reconfigured a bunch of my devices and they mostly all changed to local
on their own.
If we've put local
in the domain field of whatever the equivalent of this setting is in our router software, we have definitely made a big mistake, as PFSense outlines in its general settings page.
Do not use 'local' as a domain name. It will cause local hosts running mDNS (avahi, bonjour, etc.) to be unable to resolve local hosts not running mDNS.
This is definitely the problem I'm observing, as I've had to install avahi-daemon
on a bunch of machines that I did not previously.
Now, if this is true, we are in a situation where every device must install something equivalent to avahi-daemon
despite the fact that the DNS Server on the router can resolve these just fine, without clients needing to have their own instance of avahi-daemon
Somewhere in base networking protocols, without avahi or mDNS hostnames are transferred to the router. If our domain is set to .local
rather than something else like .localdomain
or .lan
it means we can't resolve hosts that aren't running mDNS.
If we set the domain to .local and have a device that's not running mDNS with a hostname of foo
and it has obtained a DHCP lease from the router, meaning that the router now knows its hostname as configured on the device via some part of DHCP. If the router recieves a lookup for foo.local
then it will return the ip address of foo.local
successfully.
however if you try to look up foo.local
from a separate device that is running mDNS via the avahi-daemon
then it will fail to look up foo.local
because the mDNS daemon is preferred. It will not be able to return foo.local
's IP address, because foo.local
is not running mDNS.
Not using .local
in the router's domain set up avoids this scenario and allows all devices to find out information about their hostnames and supports devices that aren't running mDNS daemons, rather than not at all, as would be the case if we chose not to enforce .local
as the router's domain, which PFSense, OpenWRT and more warn against.
Now, what has ocurred is that you cannot ping hostnames unless you have an mDNS daemon installed on your system, and vice-versa. This is not the way it should be done and explains why all the devices that had .localdomain are no longer visible to even the router itself. All we have done is invalidate the utility of the router's DNS, as it can no longer report back a lookup to a hostname at all.
When you run dig
and specify .local
, it makes sure to make you aware that .local is reserved for Multicast DNS, mDNS is not supposed to be implemented or enforced by the router's domain.
; <<>> DiG 9.14.5 <<>> matt-octoprint.local @10.0.0.1
;; global options: +cmd
;; Got answer:
;; WARNING: .local is reserved for Multicast DNS
;; You are currently testing what happens when an mDNS query is leaked to DNS
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 59060
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;matt-octoprint.local. IN A
;; Query time: 1 msec
;; SERVER: 10.0.0.1#53(10.0.0.1)
;; WHEN: Fri Sep 27 03:54:24 BST 2019
;; MSG SIZE rcvd: 49
@goatchurchprime This is why .localdomain is a thing, or exists at all. So that devices without mDNS
can still look up hostnames without mDNS.
I can have a look at this later today but UniFi support rebroadcasting mDNS responses in order for them to still work in this case
"you cannot ping [local] hostnames unless you have an mDNS daemon" Given that's the whole point of mDNS I don't think there's anything particularly non-standard going on here. Having the router's internal hidden DNS proxy also happen to return results for things random people have told it on DHCP sounds a bit more non-standard but what do I know?
I've turned back on the DHCP results showing in DNS and set the network's domain to localdomain
, I also tried does.localdomain
but that didn't seem to work either. I'll leave it as is for now, maybe it'll only work for things when they renew their DHCP leases.
@johnmckerrell What I meant to say is that you can't ping DNS (hostnames over DHCP leasing, was a thing before mDNS existed) if you use mDNS on your system. Which is a problem, since whatever has been changed means you can't:
ping mqtt
ping mqtt.localdomain
ping mqtt.local
UNLESS you have an mDNS daemon on your computer. And that will only respect .local
, since it's mDNS. And if you are not running an mDNS daemon the network doesn't respond if the router domain is .local
, because that's reserved for mDNS. When using a .local
router domain, mDNS is all that can be used which means it will no longer respect, lookup or return non-mDNS hostnames for reasons I'm not 100% aware of, but something about conflicts.
mDNS is not the only way of getting a hostname, it's fairly modern and it just makes things easier when it's added onto a network. By using .local
for the router domain it makes it impossible to use regular DNS for hostnames. This is why localdomain exists as a convention.
A person was trying to use Alex's printer earlier but couldn't because octopi.localdomain
is no longer accessible, because he's running an mDNS daemon, and mDNS doesn't see .localdomain
, but if the router domain was anything else, be it .localdomain
or .lan
, it would return the address of that machine regardless since the mDNS would failover to the gateway's DNS resolver, which of course knows about it because of its DHCP lease.
Devices that do not have an mDNS daemon cannot participate their hostnames on the network in this configuration.
No mDNS daemon on your system = can't see anything
mDNS = Can only see .local
octopi.localdomain
works even if the device is not running avahi, because hostnames are transferred via DHCP without any mDNS functionality, which is great. This functionality is made impossible when the router domain is .local
as PFSense and OpenWRT outline.
@MatthewCroughan given you were talking about ARP records yesterday it seems like this is new knowledge to you too. I have already made the changes to mostly re-enable what we had previously just with a network domain of localdomain
rather than the conflicting local
and did so before your recent comments. Can you maybe now wait until you've been able to test before trying to teach me about this?
@johnmckerrell I'm not trying to teach you about anything. I've just been discussing it all night with a friend online and am coming to realise why localdomain is a thing. I'll curb the enthusiasm, sorry :)
The arp record comment yesterday was made before reading into any of this, or looking at my own PFSense and reading their documentation on how mDNS, caching options and more work. The Ubiquiti firmware looks like it has way more niche and non-standard features though, so there's probably a million things that are going on that I have on idea about.
@johnmckerrell, you said:
WiFi b8:27:eb:cb:96:8c - was configured to be 10.0.100.1 in the DHCP - now configured to 10.0.100.2 Wired b8:27:eb:9e:c3:d9 - wasn't configured, now configured to be 10.0.100.1
Does that mean that mqtt.local
should be resolving to one (or both) of those IP addresses? At present neither of those IP addresses is responding to pings, and it seems to be resolving to 10.0.30.194
at the moment?!?
$ ping mqtt.local
PING mqtt.local (10.0.30.194) 56(84) bytes of data.
64 bytes from 10.0.30.194 (10.0.30.194): icmp_seq=1 ttl=64 time=3.86 ms
64 bytes from 10.0.30.194 (10.0.30.194): icmp_seq=2 ttl=64 time=6.38 ms
64 bytes from 10.0.30.194 (10.0.30.194): icmp_seq=3 ttl=64 time=2.50 ms
64 bytes from 10.0.30.194 (10.0.30.194): icmp_seq=4 ttl=64 time=5.46 ms
@amcewen I also said "And.. on further discussion I've removed those IP addresses from the DHCP configuration"
It seemed like the box was statically configured and to help with portability elsewhere we thought that would be best, but it seems like it might not be the case.
It seemed like the box was statically configured and to help with portability elsewhere we thought that would be best, but it seems like it might not be the case.
Not by me. @goatchurchprime? @MatthewCroughan ?
@ajlennon @johnmckerrell Are we saying that there's a box somewhere with an mDNS Daemon mqtt.local that is statically configured, that is not @ajlennon's balena pi that we're otherwise not aware of?
No, I don't think so.
WiFi b8:27:eb:cb:96:8c - was configured to be 10.0.100.1 in the DHCP - now configured to 10.0.100.2 Wired b8:27:eb:9e:c3:d9 - wasn't configured, now configured to be 10.0.100.1
I think when we looked, the wired interface had 10.0.100.1, and the WiFi one was trying to get it and having issues so we figured that the wired one was manually configured. It seems like that might not be the case?
Historically mqtt.local has changed its IP address - I think you found this @amcewen
My understanding is that it's changed its IP address again.
My belief is that it is picking up an IP address from the DHCP server on the network unless somebody else has been in there and changed things around.
I can double check this tomorrow.
The 10.0.100.1 & 2 ones have been allocated for this use (stuck on a wiki) so please do use them. Or I can put them back into the dhcp settings if we prefer.
-- Sent from my mobile phone hence brevity and errors
On 1 Oct 2019, at 22:23, Alex Lennon notifications@github.com wrote:
Historically mqtt.local has changed its IP address - I think you found this @amcewen
My understanding is that it's changed it's IP address again.
My belief is that it is picking up an IP address from the DHCP server locally unless somebody else has been in there and changed things around.
I can double check this tomorrow.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.
@johnmckerrell this caching issue is happening again.
Despite the fact that the Pi is running avahi-daemon
, it is not returning .local
I believe this is because whatever this feature is, it prevents mDNS discovery when a .localdomain addr is cached. I really hope this can get solved.
Whatever the case, not providing .local
or .localdomain
when interacting and only providing the hostname seems to work. ping ender3-octoprint
will still work, which is all that matters.
An additional datapoint...
I haven't had any problems talking to a number of Pis with my Museum in a Box stuff over the past week or two. They're all configured with a hostname of box
- there's the one on the bookcase by the main door, which has been up for 9 days now - and then three more Pis which have been on and off repeatedly while I've been testing things (although only one of them on at any one time, but I've been switching between them lots)
I haven't had any problems talking to them with ssh pi@box.local
and ssh pi@box-2.local
, and similarly talking to the Node RED instances in a browser. The one on the bookcase has been both box.local
and box-2.local
at various points, but the other Pi (the one I've been trying to contact during the testing) has always responded at the other name. I basically run uptime
when I've logged in to double-check I'm on the right Pi.
I don't ever try connecting to them without the .local
bit, and haven't ever tried .localdomain
until just now, when it worked fine.
OK so I have restarted mqtt.local with only the wired interface supported. It appears to be responding to mqtt.local on the expected IP address
I can ping ender3-octoprint.local, also .localdomain, I can't ping it without those because then it tries to resolve to my work vpn network.
@johnmckerrell My understanding is that if you have an avahi-daemon running, /etc/resolv.conf is going to be pointing to some sort of private network which is the avahi-daemon. If that fails it'll then query the router DNS to see if the machine exists (the default if you don't have an avahi-daemon). The problem is that the ubiquiti feature I think is masking .local some of the time for the same reason it sometimes provides the wrong hostname.
3.14. Host Name Option This option specifies the name of the client. The name may or may not be qualified with the local domain name
Well all I'm wondering is if the device is telling the router that it is foo.local
and the router is then reporting this back, but I'm unclear on whether the documentation above (from the RFC) just means "when you later try to use this hostname it may or may not be qualified with the local domain name" or does it mean "you can pass a domain name in with the hostname". I would expect the former really.
Just to confirm, the router has its domain set to localdomain
so it "shouldn't" be trying to do anything with the .local
domain, unless as it says it is being told this by things requesting DHCP leases and then reporting that back out.
@johnmckerrell My understanding is that outside of mDNS the device requests an IP and gives a hostname. The hostname that is given is usually specified in /etc/hosts like so:
127.0.0.1 localhost.local localhost thinkpad
::1 localhost.local localhost thinkpad
If I chose to request localhost.lan then ping thinkpad.lan
should respond with the ip of my machine.
My theory is that this is the first thing that the router's feature caches, in the same way that Sams-Iphone.localdomain was causing a problem, it is returning .localdomain some of the time rather than allowing mDNS responses all of the time if both parties have an mDNS daemon.
This might still come down to your personal machine's configuration too. Since theoretically the mDNS daemon should be the first query, then the router's dns, but this may not be happening everywhere.
@johnmckerrell After following this, I've got it working on my laptop. For some reason mqtt.local now returns an ipv6 address, whereas I believe I saw on @amcewen's machine it returns an ipv6 address. It all comes down to one's client configuration, which is actually really disappointing since it seems to vary so much between even two installations of Ubuntu.
https://unix.stackexchange.com/questions/43762/how-do-i-get-to-use-local-hostnames-with-arch-linux
The configuration in question is in /etc/nsswitch.conf
Configuration before following the guide:
hosts: files mymachines myhostname resolve [!UNAVAIL=return] dns
Configuration after following the guide, fixes it:
hosts: files mdns_minimal [NOTFOUND=return] dns myhostname
MDNS also works just fine on the Vinyl Cutter pc, though there is some strange behaviour that I think is related to the wifi.
Discovery of mDNS on the vinyl cutter pc is strangely intermittent. I can't recreate it exactly, but I did observe it.
If I execute ping mqtt.local
it will take some time (around 5 seconds) to resolve it. This will sometimes fail. Though after succeeding once it has no issue resolving subsequently. It will fail to resolve if the system were brought out of hibernation, but will work if you probe it enough.
This failure to resolve and massive resolve delay is not true of pinging the IP address of the machine directly, so it's definitely an mDNS related issue, whether that's down to configuration or the wifi hardware being slow. I do notice that the system has a massively variant ping response time when pinging local addresses. Pinging the router will result in anywhere from 10ms to 262ms.
The configuration of /etc/nsswitch.conf
on that machine which is a fresh Ubuntu 19.04 is:
hosts: files mdns4_minimal [NOTFOUND=return] dns
and it returns ipv4 addresses for all .local
addresses. This is due to mdns4_minimal
, as I tried switching it to mdns_minimal
. I later discovered that this obviously means ipv4 explicitly.
https://askubuntu.com/questions/843943/how-to-replace-mdns4-minimal-with-bind
This gives us all the details related to what the different possible configurations are.
I've checked on Arthur's Win10 laptop, and it also seems to work. It returns Ipv6 addresses. The same was not true however of my Win10 virtual machine until I enabled the avahi-daemon on the host machine, which is very interesting to me, not sure I understand what's happening there.
Since yesterday the Liverbird hasn't been showing our energy usage.
Doing a bit of poking into it, I found that
mqtt.local
was offline. @ajlennon power-cycled it, which has brought it back up, but it's failing to connect to its influxdb instance.