balena-os / resin-device-toolbox

[DEPRECATED] The official Resin Device Toolbox CLI for resinOS
Apache License 2.0
14 stars 3 forks source link

rdt scan won't find resinOS (Windows 10 & Ubuntu) #51

Open nmaas87 opened 7 years ago

nmaas87 commented 7 years ago

rdt scan does not find my Raspberry Pi 3 with resinOS on the network. I am directly connected via an dumb Layer 2 switch, no firewalls or similar in place. However, I have Docker for Windows installed and this does generate an vEthernet (DockerNAT) called Hyper-V Virtual Ethernet Adapter. As long as this adapter is activated, rdt scan does not find anything on the net (maybe because it thinks "this" would be the uplink (however, it does have an ip of 10.0.75.1 and is only the NAT for Docker containers). If Docker for Windows is shut-down, the Ethernet Device disappears and rdt scan works. Another "work-around" is to disable said Virtual Ethernet Adapter. however, Docker won't work anymore if you do that. Maybe you can improve the scan algorithm in that way, that it does ignore Interfaces with the 10.0.75.1 IP - to get it working even in those situations. (Finding this error and the solutions took me several days and pure luck ;))

Probably an error in https://github.com/resin-io-modules/resin-sync as well

nmaas87 commented 7 years ago

Ok, the error is more widespread. rdt does only scan on the - what it thinks to be - the main interface. If you happen to have an Wifi Uplink to the internet and directly connect the resinOS device to your local ethernet port, it will only try to scan Wifi and come back negative. Only if you disable ALL devices except the ethernet where your resinOS device is attached too, it will find it. It should scan all interfaces to mitigate such problems, except local loopbacks and vethernet with docker ip (10.0.75.1).

nmaas87 commented 7 years ago

Seems also to be a problem on ubuntu! (If your main uplink is not the same network as your resinOS device, it won't find it)

lekkas commented 7 years ago

Hey @nmaas87 , can you please give us your installed rdt version ($ rdt version) ?

hedss commented 7 years ago

@nmaas87 Thank you for reporting this; I believe it's related to resin-io-modules/resin-discoverable-services#21. @lekkas I believe this probably needs prioritising.

nmaas87 commented 7 years ago

@lekkas that would be 0.0.7 @hedss you're most welcome. and you're right. this is definitely the case, it is related.

nmaas87 commented 7 years ago

@lekkas @hedss Any Update on that issue? I wanted to hold a talk about resin.os at an Pi Jam on 14.01.2017 and it would be great if this error was sorted out until then :).

alexandrosm commented 7 years ago

hey Nico, thanks for the ping! We'll get this sorted as soon as people come back from holidays, assuming there;s not some huge landmine hiding underneath.

--

Alexandros Marinos

Founder & CEO, Resin.io

+1 206-637-5498

@alexandrosm

On Mon, Dec 26, 2016 at 7:59 PM, Nico Maas notifications@github.com wrote:

@lekkas https://github.com/lekkas @hedss https://github.com/hedss Any Update on that issue? I wanted to hold a talk about resin.os at an Pi Jam on 14.01.2017 and it would be great if this error was sorted out until then :).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/resin-os/resin-device-toolbox/issues/51#issuecomment-269238296, or mute the thread https://github.com/notifications/unsubscribe-auth/ABLUCG6yzFo1y0xA-HL5rJr5fk8EUW1Qks5rMBylgaJpZM4LQL2k .

hedss commented 7 years ago

@nmaas87 Hi Nico, Sorry for the delay. I've carried out some investigation into this and my results are under the resin-io-modules/resin-discoverable-services#21 issue already opened.

It's a long read, but the ultimate upshot is that there's unexpected behaviour going on, and we're going to need to discuss this internally to decide on priorities and potential solutions.

The short-term solution is to ensure that resinOS devices are available on your primary network interface. Sorry, I know this isn't ideal by any means.

alisondavis17 commented 7 years ago

Hi @nmaas87, that's awesome that you're planning to speak about resinOS at a Pi Jam! Let us know if we can help with anything - presentation materials, sending some resinOS stickers, etc. If you haven't seen it yet, this presentation on resinOS from ELC Europe last fall has some good background information as well.

nmaas87 commented 7 years ago

@alisondavis17 Thanks, I already know that presentation and I am already finished with creating the same, I will hold the presentation at the http://piandmore.de/en on 14.01.2017. I am kind of an old guy when it comes to "resin", as I already talked last time about resin.io, you can find the (German) presentation in shape of a video as well as the linked presentation material: https://www.nico-maas.de/?page_id=1244 I would be very happy if you want to provide stickers, goodies or other stuff, you can contact me on mail@nico-maas.de :)

trinitronx commented 7 years ago

TLDR;

I've encountered some sort of problem with rdt scan on a network containing a single ResinOS device (Raspberry Pi 3). I'm running rdt on a Mac, and have determined that perhaps the ResinOS device is not advertising it's Avahi service correctly (as tested via dns-sd -B and avahi-browse -a from Mac & Linux hosts on same network). I can ping the device and attempt to use ssh -vv -p 22222 root@<DEVICE IP> to connect. I do not know what ResinOS password is set, and SSH key tied to my account does not appear to work.

More detailed troubleshooting info below:

Problem

rdt scan always says "Could not find any resinOS devices in the local network".

Here is information requested in the "new issue" template:

rdt scan --verbose
Reporting scan results
Could not find any resinOS devices in the local network

If you need help, or just want to say hi, don't hesitate in reaching out at:

    GitHub: https://github.com/resin-os/resin-device-toolbox/issues/new
    Gitter: https://gitter.im/resin-io/cha

A bit problematic, but this Macbook has many network interfaces due to using VirtualBox, Docker, docker-machine-xhyve-driver, VPN connections, Cell phone tethering, Thunderbolt Ethernet adapter, etc...

Just a sample of ifconfig output to see how complicated it can get on a well-used and network-exercised Macbook (all IPs and MAC addresses obfuscated for privacy reasons):

$ ifconfig 
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 16384
  options=3<RXCSUM,TXCSUM>
  inet6 ::1 prefixlen 128
  inet 127.0.0.1 netmask 0xff000000
  inet6 fe80::1%lo0 prefixlen 64 scopeid 0x1
  nd6 options=1<PERFORMNUD>
gif0: flags=8010<POINTOPOINT,MULTICAST> mtu 1280
stf0: flags=0<> mtu 1280
en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
  ether 98:01:a7:12:34:56
  inet 192.168.1.129 netmask 0xffffff00 broadcast 192.168.1.255
  media: autoselect
  status: active
en1: flags=963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX> mtu 1500
  options=60<TSO4,TSO6>
  ether 6a:00:02:12:34:56
  media: autoselect <full-duplex>
  status: inactive
en2: flags=963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX> mtu 1500
  options=60<TSO4,TSO6>
  ether 6a:00:02:12:34:56
  media: autoselect <full-duplex>
  status: inactive
bridge0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
  options=63<RXCSUM,TXCSUM,TSO4,TSO6>
  ether 9a:01:a7:12:34:56
  Configuration:
    id 0:0:0:0:0:0 priority 0 hellotime 0 fwddelay 0
    maxage 0 holdcnt 0 proto stp maxaddr 100 timeout 1200
    root id 0:0:0:0:0:0 priority 0 ifcost 0 port 0
    ipfilter disabled flags 0x2
  member: en1 flags=3<LEARNING,DISCOVER>
          ifmaxaddr 0 port 5 priority 0 path cost 0
  member: en2 flags=3<LEARNING,DISCOVER>
          ifmaxaddr 0 port 6 priority 0 path cost 0
  nd6 options=1<PERFORMNUD>
  media: <unknown type>
  status: inactive
p2p0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 2304
  ether 0a:01:a7:12:34:56
  media: autoselect
  status: inactive
awdl0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1484
  ether 4e:8a:d0:12:34:56
  inet6 fe80::4cab:1234:5678:abcd%awdl0 prefixlen 64 scopeid 0x9
  nd6 options=1<PERFORMNUD>
  media: autoselect
  status: active
utun0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 1500
  inet6 fe80::ab12:3456:bcde:1111%utun0 prefixlen 64 scopeid 0xa
  nd6 options=1<PERFORMNUD>
utun1: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> mtu 1500
  inet6 fe80::123:abcd:ffff:9979%utun1 prefixlen 64 scopeid 0xb
  nd6 options=1<PERFORMNUD>
vboxnet0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
  ether 0a:00:27:00:00:00
  inet 192.168.99.1 netmask 0xffffff00 broadcast 192.168.99.255
en6: flags=8963<UP,BROADCAST,SMART,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
  ether 6e:40:bc:12:34:56
  media: autoselect
  status: active
bridge100: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500
  options=3<RXCSUM,TXCSUM>
  ether 9a:01:a7:12:34:56
  inet 192.168.64.1 netmask 0xffffff00 broadcast 192.168.64.255
  Configuration:
    id 0:0:0:0:0:0 priority 0 hellotime 0 fwddelay 0
    maxage 0 holdcnt 0 proto stp maxaddr 100 timeout 1200
    root id 0:0:0:0:0:0 priority 0 ifcost 0 port 0
    ipfilter disabled flags 0x2
  member: en6 flags=3<LEARNING,DISCOVER>
          ifmaxaddr 0 port 14 priority 0 path cost 0
  nd6 options=1<PERFORMNUD>
  media: autoselect
  status: active

Trying to track down the issue (down the rabbit hole...)

So this is no wonder that rdt is having problems if it tries to detect the primary uplink to scan. Also given that being a node.js app, it could be run on many variant *nix platforms with different interface naming schemes.

I tried my best to track this down, but with little node.js or coffeescript knowledge, and the amount of indirection happening, I'm not sure exactly what is going on. I did find some useful places digging into the code, it appears that rdt scan calls discoverLocalResinOsDevices which appears to come from resin-sync. Next, resin-sync uses another function called enumerateServices from library resin-discoverable-services.

Finally, somewhere in this library it's looking at mDNS / Bonjour services. Presumably it is looking for advertised service of type: _resin-device._sub._ssh._tcp.

So, I decided to check for this service type using the OS X builtin command: dns-sd

$ dns-sd -B _resin-device._sub._ssh._tcp
Browsing for _resin-device._sub._ssh._tcp
DNSService call failed -65540

Scanning the same network for the standard _ssh._tcp worked fine & returned what I'd expect. So there's something maybe odd about that service type perhaps?

I decided that since dns-sd does not usually help much when I just want to see all the advertised services on the network, so I turned to another Raspberry Pi with avahi-browse installed.

This time, I just ran the "browse all" command to see every mDNS / Bonjour service on the network:

$ avahi-browse -a
[...SNIP...]
[... Lots of real services here on the network ...]
[... But NO mention of _resin-device._sub._ssh._tcp
 ... OR any hostname I could recognize as the ResinOS Pi host ...]
[...SNIP...]

So, I'm not sure what I've really determined other than perhaps there's something causing issues with that _resin-device._sub._ssh._tcp service name (at least to dns-sd utility), or that somehow the ResinOS device is not really advertising itself on mDNS / Bonjour / Avahi?

I was able to get my device up and running in the Resin.io dashboard site, and I got into it with the slick browser-based terminal. When inside the container, I could list all listening processes on the Raspberry Pi (again... inside container) with netstat:

$ netstat -tunlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.2:53            0.0.0.0:*               LISTEN      -
tcp6       0      0 :::48484                :::*                    LISTEN      -
tcp6       0      0 :::22222                :::*                    LISTEN      -
tcp6       0      0 :::80                   :::*                    LISTEN      159/resin-go-hello-
udp        0      0 127.0.0.2:53            0.0.0.0:*                           -

Aha! So here's that [_resin-device._sub._ssh._tcp service running on port 22222 as we'd expect... at least according to what this Avahi service XML shows

I was able to test out a manual ssh client connect by grabbing my ResinOS device's IP as displayed under the "Device Summary" page (dashboard.resin.io/apps/XXX/devices/NNN/summary):

$ ssh -vv -p 22222 root@192.168.x.xxx
root@192.168.x.xxx's password:

Ok, so we've established that the actual sshd or dropbear daemon is running on the device at least! But there's still no way I can get in... I don't know what password to try, my SSH key that I imported via GitHub onto my Resin.io account does not appear to work. Any ~/.ssh/authorized_keys that I try to add in via vim and the the Resin.io browser terminal does not appear to be used (yes, I checked ~/.ssh/ and authorized_keys file permissions were correct!).

So I'm not quite sure how to get a true ssh session on the device, or what magic rdt does behind the scenes to get this working... but at least the dashboard.resin.io browser based terminal works... so that's something! Actually quite amazing that this more complicated stack is able to access the device over the web (and presumably VPN?), but I can't figure out how it's doing it with a simple ssh command.

Hypothesis

Given the above information, my best guesses for what the issue might be in my case:

  1. Avahi in the ResinOS device is not broadcasting the service on the Raspberry Pi 3
    • Maybe it is not listening on the RPi 3's WiFi interface, but only ethernet (not plugged in)?
    • Maybe it is broadcasting only to the Resin.IO VPN interface?
    • Maybe some other configuration issue?
  2. Something is wrong with the mDNS service type, causing broadcast / discovery issues?
    • Maybe it's too long, or non-standard in some way?
    • Maybe OS X dns-sd is buggy?
  3. Maybe rdt scan is broadcasting on the wrong Macbook network interface?
    • Seems likely because other issues seem to mention this,
      but it could be a "red herring" as both dns-sd -B and avahi-browse -a on multiple different hosts (Multiple Macbook, Raspbian Linux) do not discover the ResinOS device.
  4. Maybe there is a bug in node.js having to do with mDNS / Bonjour / Avahi / Multicast DNS handling?
    • Seeing as how ResinOS is heavy on the coffeescript + Node.js JavaScript, as is rdt... this may also be a possibility. Other non-js based mDNS implementations appear to work ok on my network to discover each other. The odd man out is ResinOS + rdt.
trinitronx commented 7 years ago

Just read through a bit of resin-io-modules/resin-discoverable-services#21 to check if I missed something, and came across the node multicast-dns test code.

I decided to run it from an Intel Edison on the same network to see if I could get any more useful information:

# Check for running mDNS on Intel Edison
# Stop if already running so we can test node.js mDNS bind to this interface
root@gamma-lyrae:~/test-mdns# netstat -tunlp | grep 'mdns\|5353'
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
udp        0      0 0.0.0.0:50337           0.0.0.0:*                           754/mdnsd
udp        0      0 0.0.0.0:5353            0.0.0.0:*                           754/mdnsd
udp        0      0 0.0.0.0:5353            0.0.0.0:*                           754/mdnsd
udp6       0      0 :::40889                :::*                                754/mdnsd
udp6       0      0 :::5353                 :::*                                754/mdnsd

root@edison01:~/test-mdns# ifconfig wlan0
wlan0     Link encap:Ethernet  HWaddr 90:b6:00:12:34:56
          inet addr:192.168.123.456  Bcast:192.168.123.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:2542592 errors:0 dropped:15 overruns:0 frame:0
          TX packets:239189 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:882689458 (841.7 MiB)  TX bytes:19976139 (19.0 MiB)
root@edison01:~/test-mdns# npm install multicast-dns
root@edison01:~/test-mdns# node

I then ran the code that the given coffeescript seemed to translate to:

> mcast = require('multicast-dns')
[Function]
> var mcast, mdns;
undefined
> mdns = mcast({ interface: '192.168.123.456' })
EventEmitter {
  domain:
   Domain {
     domain: null,
     _events: { error: [Function] },
     _eventsCount: 1,
     _maxListeners: undefined,
     members: [] },
  _events: {},
  _eventsCount: 0,
  _maxListeners: undefined,
  send: [Function],
  respond: [Function],
  response: [Function],
  query: [Function],
  destroy: [Function] }

> mdns.on('query', function(packet, rinfo) {
...     console.log(packet);
...     return console.log(rinfo);
...   });
EventEmitter {
  domain:
   Domain {
     domain: null,
     _events: { error: [Function] },
     _eventsCount: 1,
     _maxListeners: undefined,
     members: [] },
  _events: { query: [Function] },
  _eventsCount: 1,
  _maxListeners: undefined,
  send: [Function],
  respond: [Function],
  response: [Function],
  query: [Function],
  destroy: [Function] }
> mdns.on('response', function(packet, rinfo) {
...     console.log(packet);
...     return console.log(rinfo);
...   });
EventEmitter {
  domain:
   Domain {
     domain: null,
     _events: { error: [Function] },
     _eventsCount: 1,
     _maxListeners: undefined,
     members: [] },
  _events: { query: [Function], response: [Function] },
  _eventsCount: 2,
  _maxListeners: undefined,
  send: [Function],
  respond: [Function],
  response: [Function],
  query: [Function],
  destroy: [Function] }
> mdns.on('error', function(error) {
...     return console.log(error);
...   });
EventEmitter {
  domain:
   Domain {
     domain: null,
     _events: { error: [Function] },
     _eventsCount: 1,
     _maxListeners: undefined,
     members: [] },
  _events: { query: [Function], response: [Function], error: [Function] },
  _eventsCount: 3,
  _maxListeners: undefined,
  send: [Function],
  respond: [Function],
  response: [Function],
  query: [Function],
  destroy: [Function] }
> mdns.query([
...     {
.....       name: '_resin-device._sub._ssh._tcp.local',
.....       type: 'PTR'
.....     }
...   ]);
undefined
> mdns.query([ { name: '_ssh._tcp', type: 'PTR' } ]);
undefined
> mdns.query([{name:'<hostname-that-exists>.local', type:'A'}])
undefined

> mdns
EventEmitter {
  domain:
   Domain {
     domain: null,
     _events: { error: [Function] },
     _eventsCount: 1,
     _maxListeners: undefined,
     members: [] },
  _events: { query: [Function], response: [Function], error: [Function] },
  _eventsCount: 3,
  _maxListeners: undefined,
  send: [Function],
  respond: [Function],
  response: [Function],
  query: [Function],
  destroy: [Function] }
> mdns.query({  questions: [{name:'<another-hostname-that-exists>.local', type:'A'}] })
undefined
> mdns.query({  questions: [{name:'<yet-another-hostname-that-exists>.local', type:'A'}] })
undefined
> mdns.query({  questions: [{name:'_ssh._tcp', type:'PTR'}] })
undefined

Hmm... all undefined responses? Not sure this is right...

hedss commented 7 years ago

Hi @trinitronx

Thanks for the detailed reports! However, the reason you're not able to detect your RPi3 using rdt is because the particular OS images from our resin.io service do not run Avahi as yet. This functionality is only currently available on resinOS images of version 2.0 and greater, which are currently in Beta.

To connect to a device hosted on the resin.io service with resinOS versions less than 2.0 (of which I see you're using), you'll need to use the resin-cli command instead, details of which can be found here. You will need to have pushed an Application to your device first, and the guide to getting up and running with this is here.

Your comment about the primary interface is true, which is what the rest of this Issue relates to. However, rdt uses standard NodeJS functionality to enumerate interfaces across host OS platforms, and this is not deemed a problem (and indeed has been tested on the 'big three').

So, in conclusion:

Hope this helps!

trinitronx commented 7 years ago

@hedss: Thanks for the clarification! For some reason I must have missed this distinction between the 1.x and 2.x.beta images & assumed that ResinOS was supposed to be running Avahi or mdns like the default Yocto image for Intel Edison does.