Open bredo228 opened 1 month ago
I have an exceedingly vague memory about lnl-nat being first being intentional. I don't know if that was said or if I'm even remembering it properly but it may be that there is some reason behind lnl-nat being first.
However, @Frooxius was suggesting maybe making this change on a branch we have for a B2B contract and of course Froox would know more.
LNL NAT punchthrough mechanism should actually establish direct connection in most cases on itself - the way it works it makes both sides try to connect to whatever their publicly visible IP is - if that's directly exposed IP on the server, that should generally just work.
However based on the log, there's no connection offered over direct LNL - therefore it cannot prioritize that at all.
There's some question on how to achieve this - you'd need to provide the public IP for that one yourself, so it can be offered, probably via a config file.
But generally you shouldn't need to - the LNL Bridge should see the public IP and tell the other client to connect to that - question is, why is it failing in this case?
Are you able to provide any details on when this actually fails? What the server configured like? Can this be reproduced reliably?
This server's configured with the headless listening on port 25570 on a public IPv4 address assigned to it - from the client, I can successfully direct connect to it by using the "Open World" ProtoFlux node to lnl://172.96.161.57:25570
{
"loginCredential": "BigGreenWolfy",
"loginPassword": "no",
"startWorlds": [
{
"isEnabled": true,
"sessionName": "[US] The ReSync Lounge",
"customSessionId": "S-U-BigGreenWolfy:SyncLounge",
"forcePort": 25570,
}
],
}
The problem that I've introduced in this network lies within one of the routers randomising the source port of the connection on the client side - a common occurrence when carrier grade NAT is performed, or in this case it's just because opnsense randomises ports by default when doing outbound NAT. This can be reproduced reliably.
Firewall performing randomization:
It looks like whatever connection that is being established is failing during LNL punchthrough - I can see traffic flowing but I'm unsure where it's failing at establishing a connection - it seems to just be failing somewhere.
On a branch, I have a heck of a lot more logging on all parts of this problem. We should be able to more easily figure it out once that's over on main.
I've got some tidying up to do on that branch though.
You might be able to mess around with: https://github.com/Yellow-Dog-Man/LNLBridgePoker to diagnose networking issues.
It emulates the barebones of what the LNL Bridge and Resonite/headless do. Not sure if it would be helpful but hey its there.
On a branch, I have a heck of a lot more logging on all parts of this problem. We should be able to more easily figure it out once that's over on main.
I've got some tidying up to do on that branch though.
Compiled the latest commit of LiteNetLib from https://github.com/Yellow-Dog-Man/LiteNetLib with all the debugging enabled, stuck it on the headless and client and have the logs here:
client - H370-WIN11 - 2024.6.5.1084 - 2024-06-07 18_45_13.log headless - glaceon - 2024.6.5.1084 - 2024-06-07 06_44_25.log
Not sure how useful they'll be but they're here.
Describe the bug?
When joining a session, attempting to directly connect over lnl is supposed to be prioritised over lnl-nat, as shown in the flow chart found at https://wiki.resonite.com/Networking_Information#Establishing_Connections
However, direct connections appear to never be attempted, with the client immediately trying to use lnl-nat to connect.
This isn't great in situations where you have a strict NAT in place, as the client will need to fall back on the LNL relays to establish a connection, causing load on the relays.
To Reproduce
Have a client on a network where lnl-nat punchthrough is known to fail (e.g. a strict NAT scenario - the opnsense firewall's default configuration randomizes ports in its outbound NAT so is perfect for testing this, mobile phone connections also have a tendency to do this)
Try to connect to a session that should be able to be connected to directly - a headless session with forcePort enabled & allowed through the host's firewall is a common situation where this will work.
Notice that a direct connection is never attempted and the LNL relay is used after the client fails NAT punchthrough.
Expected behavior
LNL to attempt a direct connection and succeed, causing the client to connect directly to the session rather than going via the relays.
Screenshots
No response
Resonite Version Number
2024.5.7.505
What Platforms does this occur on?
Windows, Linux
What headset if any do you use?
Desktop
Log Files
Client log (Relevant lines start at line 1272) DYSNOMIA - 2024.5.7.505 - 2024-05-08 18_00_23.log
Headless log (Relevant lines start at line 967) headless - 2024.5.7.505 - 2024-05-08 05_59_41.log
Additional Context
No response
Reporters
bredo, sveken, t.o.a.s.t.e.r.