Jigsaw-Code / outline-apps

Outline Client and Manager, developed by Jigsaw. Outline Manager makes it easy to create your own VPN server. Outline Client lets you share access to your VPN with anyone in your network, giving them access to the free and open internet.
https://getoutline.org/
Apache License 2.0
8.47k stars 1.37k forks source link

Problem with iOS and MacOS clients v1.2.1(Oct 19 2018 release) #352

Closed jsquyres closed 6 years ago

jsquyres commented 6 years ago

I've been using Outline on iOS for many months now (with a Digital Ocean droplet). Many thanks!

Ever since the Oct 19 update to v1.2.1, however, I have been having problems.

Specifically: periodically, the iOS Outline client (on iPhone 8+, iOS 12.0.1/16A404) gets into the following state:

After Outline disconnects, the "VPN" indicator (correctly) goes out, and traffic starts flowing again. If I immediately try to make the Outline client connect again, it tries several times and eventually gives up.

This only seems to happen when on wifi. I have noticed that this seems to happen when I move from cellular to wifi, but I don't know if this is always the case.

HOWEVER, when this problem occurs, I notice that if I leave Outline disconnected for a few minutes and then go back into the Outline client app and try to connect again, it connects just fine (i.e., the "VPN" indicator goes on and traffic is still flowing).

Did something change in the v1.2.1 release with regards to cached connections to my Outline server, perchance? Or did something regarding my connection to my Outline server become sensitive to changing network routes (e.g., switching from cellular to wifi)?

I ask because I see "Bypass LAN and private networks in the VPN" in the v1.2.1 release notes. What exactly does that mean?

Please let me know if there's more information that I can provide to help debug this issue. Thanks!

jsquyres commented 6 years ago

One clarification: I noticed that even after some time on the same wifi network, the iOS Outline client can get in the same failed state (i.e., no traffic gets through, the Outline client gets stuck in a "reconnecting" loop).

Specifically:

  1. Last night when I filed this github issue (around 10pm), I had my iPhone successfully connected to the Outline server, and traffic was flowing fine. I went to sleep shortly thereafter.
  2. I see in my iCloud logs that my phone successfully performed an iCloud backup at 2:04am last night.
  3. But when I got up around 6:30am this morning, Outline was in the failed state (i.e., no traffic could get through -- e.g., I couldn't load any web pages -- and the iOS Outline client was stuck in the "reconnecting" loop).
  4. I tapped "disconnect" in the Outline client and traffic started flowing again.
  5. About 45 minutes later (without having left the same wifi network), I tapped "connect" in the Outline client, it successfully connected, and all traffic continued to flow.
alalamav commented 6 years ago

Thanks for the detailed report, @jsquyres. I have been investigating this issue and, unfortunately, have found it hard to reproduce consistently. In my testing, I have seen the connection fail with a particular server and network combination. However, it is strange that the same conditions work fine with the macOS client, which shares the code base with the iOS client. Furthermore, a different iOS device will also connect to the same server over the same network.

I believe that this is not related to the latest update, which introduced LAN and private network bypass (meaning that traffic to your local network does not go through the VPN), as I disabled this feature while testing, yet the problem persisted. My suspicion is that the app is not handling network changes correctly; I'll keep looking and update this thread with any information.

The solution that has worked for me is to reboot the phone. Can you please give that a try and let me know if that helps? It would also be useful if you could submit feedback through the app and tag it with this GitHub issue ID.

jsquyres commented 6 years ago

Ah, your explanation for bypassing the VPN for LAN/private network makes sense. Thanks.

Yes, you're right -- I can't make this problem happen on-demand. It just happens "sometimes". ☹️

I'll reboot my phone now and see if this problem occurs over the next day or three.

If it happens again, I'll submit through the app with my Github ID + this issue ID. Thanks!

yakovmanshin commented 6 years ago

I can confirm that the problem occurs on both macOS and iOS devices (versions 10.14 and 12.0.1, respectively). Switching between Wi-Fi and cellular, or even sending computer to sleep and waking it up in a couple of minutes, often leads to VPN connection issues. The symptoms are just like @jsquyres has described: the traffic starts going through as soon as VPN is disabled, and attempts to enable VPN result in some waiting and eventual failure.

I tried setting up a brand-new server (it was more than just one server—it was about a dozen), but this doesn’t seem to be helpful.

On macOS, the only working solution I’ve been able to find so far is opening System Preferences → Network → Advanced, switching to the TCP/IP tab, setting IPv4 to Off, saving and applying the configuration, then setting IPv4 back to Using DHCP.

UPD. I think this issue should be renamed to reflect that the bug is not exclusive to iOS.

jsquyres commented 6 years ago

@alalamav It happened again this morning exactly the way it happened yesterday: I went to bed last night with my iPhone on my home wifi, the Outline client was active / "VPN" icon was displayed in the top left -- traffic was flowing, and everything was fine. I see that my iPhone did an iCloud backup last night at 11:31pm. This morning I woke up around 6:15am, and the iOS client is in the failed state again (no traffic is flowing, the Outline client is in the eternal trying-to-connect state). I tried to submit feedback from the app, but of course it failed to submit, so I backed out of feedback, tapped "disconnect" to make the the client disconnect, and then I was able to submit the feedback properly through the app (per request, I cited jigsaw-code/outline-client #352 / @jsquyres). I don't know if submitting feedback while the client is disconnected changes the state data that the app submits with the report.

@yakovmanshin Thanks for the report. I renamed the issue to include "MacOS", too.

jsquyres commented 6 years ago

I forgot to mention 2 things:

  1. I rebooted my phone yesterday morning, per @alalamav's suggestion. This is the first time I've seen the problem since rebooting my phone.
  2. After the failed state this morning:
    1. I failed to remember to try to tap "Connect" again immediately after the failed state (I blame the lack of caffeine; per above, it is before 7am). Oops. If it behaves like it did before, it would have gone into the loop of trying-to-connect, but I didn't verify that this morning. ☹️
      1. That being said, after the Outline client was disconnected for ~5 minutes, I did remember to tap "Connect" again, and it connected successfully / traffic is flowing / everything is fine.

If there's any kind of debug mode that I can activate in the app to gather more data when it's in this state, I'm happy to activate it. Or if getting network traces of what traffic the phone is sending while in the failed state would be helpful, ...etc., let me know. I'm a developer myself; iOS / mobile apps aren't my specialty, but I'm not afraid to get my hands dirty with deeply technical things if it would help get you more information.

jsquyres commented 6 years ago

Update: I caused the problem to happen again this morning by leaving my wifi, going on cellular for a while, and then joining a startbucks wifi. My iPhone went into the failed state.

This time, I did remember to try having the Outline client connect immediately after disconnecting. 🎉

It failed, but in a slightly different way. It only tried to connect a small number of times before giving up (note that other traffic is flowing just fine when this happens). Specifically, it showed the circles-growing graphic 2-4 times before giving up and displaying the following (I obscured my server IP address):

outline-fail

This is a slightly different failure mode, which seems to indicate a different code path. Perhaps that is helpful in tracking down the issue...?

Note that after waiting 5 minutes, I tapped "Connect" again and it connected just fine / traffic is flowing / etc.

jsquyres commented 6 years ago

This morning, I checked what network traffic my phone is sending/receiving while it is in the failed state.

tl;dr

When my phone is in the failed state, it doesn't actually appear to be sending any traffic to my Outline server, even though it claims to be trying to reconnect.

More detail

Interestingly, when my phone is in the failed state, I only see 2 things:

  1. My phone periodically ARPs for the gateway
  2. My phone is sending isakmp-nat-keep-alive packets to an IP address that is not my Outline server

In the tcpdump output below, my phone is 192.168.10.34, aka 70:ef:00:34:98:11, and the gateway is 192.168.10.1, aka 00:08:a1:24:03:a2. All the traffic occurs on VLAN 10:

11:38:29.428381 70:ef:00:34:98:11 > 00:08:a1:24:03:a2, ethertype 802.1Q (0x8100), length 47: vlan 10, p 0, ethertype IPv4, (tos 0x0, ttl 64, id 31155, offset 0, flags [none], proto UDP (17), length 29)
    192.168.10.34.4500 > 129.192.x.y.4500: [no cksum] isakmp-nat-keep-alive
    0x0000:  4500 001d 79b3 0000 4011 0e88 c0a8 0a22  E...y...@......"
    0x0010:  81c0 xxyy 1194 1194 0009 0000 ff         .............
11:38:33.479327 70:ef:00:34:98:11 > 00:08:a1:24:03:a2, ethertype 802.1Q (0x8100), length 60: vlan 10, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.10.1 tell 192.168.10.34, length 42
    0x0000:  0001 0800 0604 0001 70ef 0034 9811 c0a8  ........p..4....
    0x0010:  0a22 0000 0000 0000 c0a8 0a01 0000 0000  ."..............
    0x0020:  0000 0000 0000 0000 0000                 ..........
11:38:33.479485 00:08:a1:24:03:a2 > 70:ef:00:34:98:11, ethertype 802.1Q (0x8100), length 60: vlan 10, p 0, ethertype ARP, Ethernet (len 6), IPv4 (len 4), Reply 192.168.10.1 is-at 00:08:a1:24:03:a2, length 42
    0x0000:  0001 0800 0604 0002 0008 a124 03a2 c0a8  ...........$....
    0x0010:  0a01 70ef 0034 9811 c0a8 0a22 0000 0000  ..p..4....."....
    0x0020:  0000 0000 0000 0000 0000                 ..........

It seems to be isakmp-nat-keep-alive'ing with 129.192.x.y (I removed the last 2 octets), and I don't know who that is. I do have my employer's MDM on my phone, though -- I don't know if this could be related.

This is the only traffic that my phone is sending while the phone is in the failed state. I'm fairly un-knowledgeable about how VPNs / encrypted proxies work, but this feels significant. I.e., it doesn't appear to actually be sending any traffic to my Outline server / trying to reconnect.

Once I tap "disconnect" in the Outline client, however, I get a torrent of traffic to/from a bunch of different peers (as expected).

After I am able to successfully reconnect to my Outline server, then -- also as expected -- nearly all the traffic from my phone is going to my Outline server. The only two exceptions (from a 60 second tcpdump slice) are:

  1. DNS and MDNS queries, both of which go to the 192.168.10.1 gateway, cited above.
  2. The same isakmp-nat-keep-alive's to the same 129.192.x.y host, which isn't on my local network.
philkunz commented 6 years ago

I can confirm, this is an issue. Happens a lot after wake up from sleep.

alalamav commented 6 years ago

Thank you for the detailed reports, @jsquyres, and thanks for @philkunz and @yakovmanshin for flagging the problem on macOS. These are very useful.

I found this post that describes behavior consistent with this issue and suggests there may be a bug in the OS when applying routing rules on connectivity changes. Another hypothesis is that we are excluding the virtual address that the VPN binds to (192.20.0.1); thus Outline stops receiving traffic, as @jsquyres indicated.

We have decided to roll back the LAN bypass feature for iOS and macOS until we understand how it is affecting the observed behavior. As we start to think of ways to fix this issue without affecting all users, would any of you be willing to join our beta channel through TestFlight? If so, please send an email to support@getoutline.org with your Apple ID (email address) and subject line "Test Flight". Note that TestFlight is only available for iOS apps.

Thank you for the help!

jsquyres commented 6 years ago

Glad to help. I sent mail to support@outline.org; I'm happy to be part of the Test Flight.

alalamav commented 6 years ago

Thank you @jsquyres. The email address is actually support@getoutline.org. Would you mid re-sending your information? I apologize for the mistake.

The iOS client v1.2.2 has been released in the App Store. Please reopen this issue if the problem persists.

jsquyres commented 6 years ago

Just to followup: after updating to iOS 1.2.2, the problem has disappeared. I've had no connectivity problems since updating.

adegbenga commented 4 years ago

It was my search to find a solution to the constant on and off connection that brought me to this post. The problem described here is not limited to IOS. I am experiencing the same problem on the ubuntu server on a Windows Hyper-V. I have created two servers now and both are losing connection intermittently. Please, what solution do you suggest? Thank you