DefinedNet / mobile_nebula

Brings nebula to mobile devices (iOS, Android)
https://defined.net
129 stars 38 forks source link

Persistent Disconnects on Mobile with Data Nebula Android App (v0.4.2-86, Nebula v1.9.4) #198

Closed ghgr closed 2 weeks ago

ghgr commented 2 weeks ago

Description: Using Nebula v0.4.2-86 on Android 15, the app disconnects from the network after a few minutes when using mobile data. Manual reconnection within the Nebula app temporarily restores connectivity, but the issue reappears shortly after. In other words, network connectivity would be maintained if the app could automatically reconnect or restart itself every few minutes.

Environment:

Steps to Reproduce:

  1. Start the Nebula app and connect on mobile data. Everything works fine.
  2. Wait a few minutes, observe connectivity loss to other hosts.
  3. Manually reconnect within the Nebula app (disconnect and connect again within the app), and observe that connectivity is restored for another few minutes.

Expected Behavior: The Nebula app should maintain a stable connection over mobile data without needing frequent manual reconnections.

Observed Behavior: Connection drops every few minutes, requiring a manual reconnection within the app to re-establish connectivity.

Logs: From log file:

Notes: Manually reconnecting through the Nebula app resolves the issue momentarily, indicating the network itself is not at fault.

Attachments: log.txt Log file attached for detailed error reference.

Potential Cause: Suspected issue with session persistence or handshake failure over mobile networks.

johnmaguire commented 2 weeks ago

Hi @ghgr - from what I see, you're attempting to handshake with 192.168.100.3 via IPv6, but this is failing with the network is unreachable error. This may be due to the listen.host being set to 0.0.0.0 instead of [::]. Unfortunately, this is not currently configurable in the app. (#159)

Secondly, you appear to gain connectivity through an IPv4 relay (192.168.100.13), but then the tunnel is torn down.

Are you able to provide correlated logs from 192.168.100.3 and 192.168.100.13?

ghgr commented 2 weeks ago

Hello @johnmaguire and thank you for your response. It prompted me to analyze the logs of the other hosts 192.168.100.3 (the host I want to connect to) and 192.168.100.13 (the lighthouse and relay), and I noticed that I had a script on 192.168.100.3 running amok restarting the nebula service. I stopped it and since then the connection is stable, so this issue is technically closed.

On the other hand, I still wonder why would the tunnel break when I restart the nebula service on 192.168.100.3 (which can happen naturally after e.g. a power outage), so I attach the correlated logs of the two servers, in case you can shed some light on this mystery.

192_168_100_3.txt 192_168_100_13.txt

johnmaguire commented 2 weeks ago

Hi @ghgr - thanks for the logs. This looks like a known bug with relays. We are working on a solution: https://github.com/slackhq/nebula/pull/1270

Once a fix is released, updating 192.168.100.3 should resolve the issue. Mobile is always the tunnel "initiatior" today (at least until #61 is solved) and this bug affects the "responder" side.

In the mean time, as you noted, you should be able to restart the initiator side to reset the relay state and get a working connection. Since there are existing issues filed for these bugs, I'm going to close the ticket out. Sorry I don't have an immediate solution for you!