Yellow-Dog-Man / Resonite-Issues

Issue repository for Resonite.
https://resonite.com
118 stars 1 forks source link

LNL Direct Connection not being attempted, causing lnl-nat to be used as the preferred method of connecting to a session. #2005

Open bredo228 opened 1 month ago

bredo228 commented 1 month ago

Describe the bug?

When joining a session, attempting to directly connect over lnl is supposed to be prioritised over lnl-nat, as shown in the flow chart found at https://wiki.resonite.com/Networking_Information#Establishing_Connections

However, direct connections appear to never be attempted, with the client immediately trying to use lnl-nat to connect.

This isn't great in situations where you have a strict NAT in place, as the client will need to fall back on the LNL relays to establish a connection, causing load on the relays.

To Reproduce

Have a client on a network where lnl-nat punchthrough is known to fail (e.g. a strict NAT scenario - the opnsense firewall's default configuration randomizes ports in its outbound NAT so is perfect for testing this, mobile phone connections also have a tendency to do this)

Try to connect to a session that should be able to be connected to directly - a headless session with forcePort enabled & allowed through the host's firewall is a common situation where this will work.

Notice that a direct connection is never attempted and the LNL relay is used after the client fails NAT punchthrough.

Expected behavior

LNL to attempt a direct connection and succeed, causing the client to connect directly to the session rather than going via the relays.

Screenshots

No response

Resonite Version Number

2024.5.7.505

What Platforms does this occur on?

Windows, Linux

What headset if any do you use?

Desktop

Log Files

Client log (Relevant lines start at line 1272) DYSNOMIA - 2024.5.7.505 - 2024-05-08 18_00_23.log

Headless log (Relevant lines start at line 967) headless - 2024.5.7.505 - 2024-05-08 05_59_41.log

Additional Context

No response

Reporters

bredo, sveken, t.o.a.s.t.e.r.

ProbablePrime commented 1 month ago

I have an exceedingly vague memory about lnl-nat being first being intentional. I don't know if that was said or if I'm even remembering it properly but it may be that there is some reason behind lnl-nat being first.

However, @Frooxius was suggesting maybe making this change on a branch we have for a B2B contract and of course Froox would know more.

Frooxius commented 1 month ago

LNL NAT punchthrough mechanism should actually establish direct connection in most cases on itself - the way it works it makes both sides try to connect to whatever their publicly visible IP is - if that's directly exposed IP on the server, that should generally just work.

However based on the log, there's no connection offered over direct LNL - therefore it cannot prioritize that at all.

There's some question on how to achieve this - you'd need to provide the public IP for that one yourself, so it can be offered, probably via a config file.

But generally you shouldn't need to - the LNL Bridge should see the public IP and tell the other client to connect to that - question is, why is it failing in this case?

Are you able to provide any details on when this actually fails? What the server configured like? Can this be reproduced reliably?

bredo228 commented 1 month ago

This server's configured with the headless listening on port 25570 on a public IPv4 address assigned to it - from the client, I can successfully direct connect to it by using the "Open World" ProtoFlux node to lnl://172.96.161.57:25570

{
  "loginCredential": "BigGreenWolfy",
  "loginPassword": "no",
  "startWorlds": [
    {
      "isEnabled": true,
      "sessionName": "[US] The ReSync Lounge",
      "customSessionId": "S-U-BigGreenWolfy:SyncLounge",
      "forcePort": 25570,
    }
  ],
}

The problem that I've introduced in this network lies within one of the routers randomising the source port of the connection on the client side - a common occurrence when carrier grade NAT is performed, or in this case it's just because opnsense randomises ports by default when doing outbound NAT. This can be reproduced reliably.

Firewall performing randomization: image

It looks like whatever connection that is being established is failing during LNL punchthrough - I can see traffic flowing but I'm unsure where it's failing at establishing a connection - it seems to just be failing somewhere.

ProbablePrime commented 1 month ago

On a branch, I have a heck of a lot more logging on all parts of this problem. We should be able to more easily figure it out once that's over on main.

I've got some tidying up to do on that branch though.

ProbablePrime commented 1 month ago

You might be able to mess around with: https://github.com/Yellow-Dog-Man/LNLBridgePoker to diagnose networking issues.

It emulates the barebones of what the LNL Bridge and Resonite/headless do. Not sure if it would be helpful but hey its there.

bredo228 commented 1 week ago

On a branch, I have a heck of a lot more logging on all parts of this problem. We should be able to more easily figure it out once that's over on main.

I've got some tidying up to do on that branch though.

Compiled the latest commit of LiteNetLib from https://github.com/Yellow-Dog-Man/LiteNetLib with all the debugging enabled, stuck it on the headless and client and have the logs here:

client - H370-WIN11 - 2024.6.5.1084 - 2024-06-07 18_45_13.log headless - glaceon - 2024.6.5.1084 - 2024-06-07 06_44_25.log

Not sure how useful they'll be but they're here.