Open hedss opened 7 years ago
I've spent a considerable amount of time looking into this, and there are several issues.
MDNS (obviously) uses Multicast groups to enable the listening/transmission of traffic for all nodes in a local network. We use what is now the fork of bonjour
because it appeared to be a full implementation of MDNS and DNS-SD under Node. Several issues have arisen since that have required patching, but ultimately this module itself relies on the multicast-dns
module for all MDNS operations. Until now, I've not really looked at it.
The comment for the interface
parameter for the constructor of multicast-dns
is:
interface: '192.168.0.2' // explicitly specify a network interface. defaults to all
This appears to be a misunderstanding of the documentation for Node's socket.bind()
call (which specifies that if no address is given, it will listen to all). Whilst true for Unicast binds, this is not true for Multicast binds where should no specific interface address be given, it will use INADDR_ANY
(see the uv__udp_set_membership4
function in libuv
which is the underlying platform library NodeJS uses). It is essentially up to the OS to bind to the first interface it deems suitable; see here, which specifies: You can always fill this last member with the wildcard address (INADDR_ANY) and then the kernel will deal with the task of choosing the interface
.
The ultimate upshot is that, if multiple interfaces are being used on a host, rdt scan
will only scan the primary interface (as it is this the OS will use for Multicast group membership).
So whilst this is not ideal, it initially does not seem a massive problem, as we could create multiple instances of the bonjour
module with specific interfaces assigned. Unfortunately, it gets trickier here depending under what operating system you're using.
Under OSX (which I primarily develop on), we have Apple's mDNSResponder
process which is launched on startup and immediately binds to the MDNS multicast port. Whilst it correctly uses the SO_REUSEPORT
socket option to allow other processes to bind to port 5353
(the MDNS port), it's essentially run as a root process, so the only way we can share the port in a specific interface case is to also run as root (non-ideal) or use INADDR_ANY
(the default for multicast-dns
) which gets the kernel to take care of it by assigning to the primary interface.
Under Linux, there's a similar issue with the avahi-daemon
, should it be running.
However, even disabling mDNSResponder
/avahi-daemon
, there still appears to be an issue in the underlying socket code which allows an interface to bind and send data to the correct Multicast group, but is not receiving any data back from it.
To test this, I created a very small test programme (CoffeeScript incoming...):
mcast = require('multicast-dns')
mdns = mcast({ interface: '192.168.2.101' }) # A secondary NIC
mdns.on 'query', (packet, rinfo) ->
console.log(packet)
console.log(rinfo)
mdns.on 'response', (packet, rinfo) ->
console.log(packet)
console.log(rinfo)
mdns.on 'error', (error) ->
console.log(error)
mdns.query([ { name: '_resin-device._sub._ssh._tcp.local', type: 'PTR' } ])
This uses the multicast-dns
module to send a query for any resinOS
devices on the local network of which '192.168.2.101' is a member (this is a secondary interface on the Linux machine I'm experimenting on).
I used tcpdump
on the Linux machine, and 'Wireshark' on an independent machine also connected to that network, to monitor for MDNS traffic. When running the test programme, I see the following from both machines:
12:04:43.302900 IP 192.168.2.101.mdns > 224.0.0.251.mdns: 0 PTR (QM)? _resin-device._sub._ssh._tcp.local. (52)
12:04:43.383737 IP 192.168.2.102.mdns > 224.0.0.251.mdns: 0*- [0q] 5/0/0 PTR resin._ssh._tcp.local., (Cache flush) TXT "", (Cache flush) SRV resin.local.:22222 0 0, (Cache flush) AAAA fe80::fb40:7c6f:6e5f:90f9, (Cache flush) A 192.168.2.102 (149)
So, the MDNS query is getting out onto the wire, and the local resinOS
device is responding correctly. What isn't happening is the multicast-dns
module seeing either the response or the original query (which it should).
I can't see anything wrong with the event handler in the module, so I'm coming to the conclusion that there's some issue further down, possibly in the way libuv
handles multicast traffic.
Verification with netstat
seems to suggest that mutlicast responses are being dropped by the kernel (which suggests that indeed the group is sending but not correctly listening).
Unfortunately, the current situation is that this is going to take a considerable amount more effort to solve, and is probably going to get into the realms of writing some C using the raw BSD libraries (and although I've experience with this, it is going to also mean poking around in libuv
) to track down what's going on.
A short-term, though not ideal, answer here is to ensure that the interface that a resinOS
device is on is the primary network interface. I have tested this under both OSX and Ubuntu and can confirm that prioritising an interface in this manner will allow rdt scan
to find them correctly.
Further to this, I now have some C using the sockets library to successfully join and send/receive data from the MDNS multicast group on two separate interfaces, both correctly sending a query and a response. I have an idea as to how we might proceed forward, but it's probably going to involve patching the multicast-dns
client.
The resin-io-modules/resin-discoverable-services#find-on-all-interfaces branch now includes the forked resin-io-modules/multicast-dns branch that includes experimental code to perform this.
In fact, it probably just needs to be cleaned up.
There is an issue where, currently, should the default interface not be one a resinOS device is connected to, it will never show up.
We can fix this by ensuring that every interface has a Bonjour instance to bind against.