Neos-Metaverse / NeosPublic

A public issue/wiki only repository for the NeosVR project
193 stars 9 forks source link

LNL-Nat not keeping connection #3360

Open 3x1t-5tyl3 opened 2 years ago

3x1t-5tyl3 commented 2 years ago

Describe the bug?

When using LNL-Nat instead of pure LNL (Direct ip connect lnl://ipadress/) the connection times out exactly 5 minutes after connecting.

Relevant issues

None applicable. There's some that seem similar but aren't.

To Reproduce

None/Not reproducable easily

Expected behavior

For the connection to last or for it to use LNL directly

Log Files

Basically empty. I'd have to enable verbose logging to get more relevant info.

Screenshots

No response

How often does it happen?

Always

Does the bug persist after restarting Neos?

Yes

Neos Version Number

Beta 2021.11.10.1253

What Platforms does this occur on?

Windows, Linux

Link to Reproduction Item/World

No response

Did this work before?

Yes

If it worked before, on which build?

Pre-lnl update build seemed to work fine

Additional context

When connecting via direct IP Adress to my headless (lnl://whateveriphere/ or steam-neos://) they seem to be perfectly able to connect and keep the connection. When connecting with lnl-nat:// specifically it seems to timeout after exactly 5 minutes. Not exactly sure why?

Reporters

Duskitten#4455 3x1t_5tyl3#0001

Joltz commented 2 years ago

Just going to dump my discord messages in here as I was having similar issues on my end until switching from our ISP's provided CGNAT to a static IP address.


Jolts — 11/08/2021 Also been having LNL connection issues that look identical to @Duskitten. Disabled every possible firewall and DMZ'd the router just as a sanity check and got the same results. Logs don't really reveal much but here's a sample from mine.

message1.txt

I had one successful connection via LNL Relay (shown here) but even when it is successful, it quickly becomes unstable and eventually get dropped completely. Wasn't an issue a few patches ago and really only started occurring commonly in the past two weeks. Also something I should note is that steam sockets connect 100% of the time they're available but have the typical issues commonly associated with them with regards to stability and packet queuing. I am also behind a CGNAT which is possibly why it always uses the relay instead of NAT punchthrough.

I am also behind a CGNAT which is possibly why it always uses the relay instead of NAT punchthrough.


Jolts — 11/10/2021 Another interesting thing happened today. I removed our AP router and put a dumb switch in its place and now I get a prompt about LNL Direct-IP when connecting to my BF on our local network (direct LAN connections never worked before) and it's still failing with the same error message.

10:49:02 PM.523 ( 45 FPS) Joining session: lnl://172.16.0.195:58190/ 10:49:02 PM.526 ( 45 FPS) NetworkInitStart 10:49:02 PM.526 ( 45 FPS) Network manager: NetX.LNL_Manager - priority: 0 10:49:02 PM.526 ( 45 FPS) Network manager: FrooxEngine.SteamNetworkManager - priority: -100 10:49:02 PM.529 ( 45 FPS) Connecting to: lnl://172.16.0.195:58190/ 10:49:08 PM.442 ( 45 FPS) Disconnected: 172.16.0.195:58190, reason: ConnectionFailed, socketErrorCode: Success 10:49:08 PM.442 ( 45 FPS) Connection failed: ConnectionFailed 10:49:08 PM.442 ( 45 FPS) All protocols failed to establish connection

Here's what the log looks like when it uses SNS fallback

message2.txt


Jolts — 11/11/2021 Another entry into this saga. We decided to swap routers entirely and LNL is still nonfunctioning. LAN connections are working again but LNL over WAN is still barely working. Going to request a static IP from our ISP tomorrow to try and remove the CGNAT from the equation and see if that helps.


Talt — 11/12/2021 it always goes like Loading... Establishing connection LNL NAT something 1 2 3 4,LNL relay and goes failed only some session im able to join

Jolts — 11/12/2021 You are probably experiencing the same problem @Duskitten and I are having. You can check my message history for more details. I'm currently working on getting a static IP from my our ISP set up and seeing if that resolves the issue.

Rottex — 11/12/2021 For the sake of completeness I just want to inform that I also have these wierd issues since I have come back from holiday. Can't connect to our VM any more (Can't punch-through and get 03:26:15.038 (100 FPS) NAT Punchthrough failed, Connecting to Relay 03:26:21.581 (100 FPS) Disconnected: , reason: ConnectionFailed, socketErrorCode: 0 03:26:21.581 (100 FPS) Connection failed: ConnectionFailed 03:26:21.581 (100 FPS) All protocols failed to establish connection )

plus also reported the UNITYTLS_X509VERIFY_NOT_DONE / SSL Problem I had the last days.

Also tried everything /different routers (even with mobile-hotspot 🙂 ), double-checked time on the VM in the hostingcenter and in my LAN. reinstalled complete neos and Steam. Double-checked the f+cking M$ firewall on my PC. No joy. I even see the relayed packets from my client coming in on the device if I check with "tcpdump -i ens3 -nn dst 172.28.231.14 and not port 22" ?(

Strange, my colleagues can work with the headless server with no problems.

Jolts — 11/12/2021 Well it seems we finally ended our saga. Getting off the CGNAT and getting a static IP fixed all of our LNL issues (and as an added bonus, nat punchthrough actually works now).


I'm guessing this is a unintended regression when switching from the old LNL library to the updated one but I can only speculate. I know most people aren't in a position where they can just switch off CGNAT on a whim (especially if you're using something like StarLink).

I should also note that I couldn't connect to lnl-nat sessions at all most of the time as opposed to connecting and then getting dropped soon after (although that was always guaranteed if I did somehow manage to connect).

kulzae commented 2 years ago

Additional information provided by Rottex on the discord

Rottex — 11/16/2021 I have done some headless/LNL testing today and I finally connected as a last resort my whole office network via a openvpn connection to a root-server of mine. Its located in a hosting center which has a official IP. So all my traffic of all my systems here in the office hop out into the internet with an official IP now. And now everything works like it did 3 weeks ago. So final conclusion: 1) The problem with LNL was on the client side 2) The headless side was always ok. In the end I this leaves now two possibilities: Either Austrian Telecom switched to CGN the last 3 weeks or something changed in the relay instance. I will double-check tomorrow with the provider to narrow down the reason.

Rottex — 11/17/2021 As promised yesterday, a short status update regarding the CGN-LNL relaying issue:

I can confirm now that:

a) today I provided my colleague (this is the other guy who had issues since he also is connected via Austrian Telecom LTE router) an openvpn key in order to give him the ability to default-route out into the internet from a fixed ip. We both appear now as the same official address. We were able working together without any issues on all the worlds (locally and headless hosted)+ we had now with this architecture a perfect latency (about 9ms or so). We never had this smooth experience in neos. b) The final step was now to call my provider and after double-checking I definitely know now that I have been all the time connected via CGN the last two years (and had no problems with neos relaying until these 3 weeks ago.) My provider changed my APN-setting to a fixed IP and neos is working fine. And the best is: I have got it for free...

Conclusion: The relay is broken with CGN.

shiftyscales commented 2 years ago

This issue is currently being investigated and reviewed internally by the QC team, who have also been assisting in documenting temporary work-arounds users can utilize until this issue is able to be reviewed, and resolved by the development team.

Thank you for your patience, and I apologize on the lack of communication regarding this frustrating issue. I will see to it that it is prioritized as soon as it's able to be.

3x1t-5tyl3 commented 2 years ago

So; to update a little bit: It seems like this is basically resolved? I still get some rare instances of not being able to connect. I'm not sure if that's to the LNL relay tho. As they're quite rare..

I'll let you guys decide if you want to close it or not.