Sinytra / ForgifiedFabricAPI

Fabric API implemented on top of NeoForge
https://sinytra.org/docs
Apache License 2.0
107 stars 13 forks source link

(login) networking seem to don't handle well clients connecting from ipv6 #10

Closed Skidamek closed 1 year ago

Skidamek commented 1 year ago

Describe the bug

(login) networking seem to don't handle well clients connecting from ipv6

Steps to reproduce

  1. Use some mod which implements some login packets, to test e.g. my mod AutoModpack (it has needed forged fapi modules inside).
  2. Setup 1.20.1 forge server with my mod and few example mods and start it
  3. Start 1.20.1 forge client with just my mod
  4. Join server not using ipv6 so e.g. localhost -> everything should work correctly, user should be prompted to download, proceed, wait and restart the game
  5. Try to join though ipv6 -> doesn't work, decoder exeption
  6. Try to join thought ipv4 -> success join

Logs

https://mclo.gs/AoMCP8u

Additional context

Issue reported on my side https://github.com/Skidamek/AutoModpack/issues/147

Thanks for porting fapi to forge making moding on forge / multiloader moding much easier! :)

Su5eD commented 1 year ago

Just to be sure, are you using forge 47.1.3?

Skidamek commented 1 year ago

No, that was from 47.1.43 I will try with .3

Skidamek commented 1 year ago

same issue

Su5eD commented 1 year ago

I tested this on a local dedicated server to which I joined using ::1 or [::1]:25565 as the IPv6 address, but I didn't get the exception. Does it occur every time you try to join the server?

Skidamek commented 1 year ago

Yes it happens every time https://streamable.com/hietkd, after that record i tested ::1 as well and that works but device ip get from e.g. ipconfig cmd windows command does not

Skidamek commented 1 year ago

That are my all mods i am using on server if you want to reproduce exact same error https://we.tl/t-bWSwmID0Q9

Skidamek commented 1 year ago

Now actually i got curios because i didn't test that on fabric i just saw that issue guy reported on forge and it is there but not sure about fabric upstream

Skidamek commented 1 year ago

Works on fabric well, looks like forge issue specific

Su5eD commented 1 year ago

Okay, after using the Link-local IPv6 Address I can also reproduce the issue now

Skidamek commented 1 year ago

Looks like it's no just some ipv6 https://github.com/Skidamek/AutoModpack/issues/147#issuecomment-1656804129

Su5eD commented 1 year ago

Here's the bug reproduced on fabric using the latest main branch fabric build of the mod. The server IP had to be set in the server properties file so that I could connect using IPv6.

fabric reproduce

The good news is I have an idea how you can fix it. I'll explain in a moment.

Skidamek commented 1 year ago

Oh nice! :)

Su5eD commented 1 year ago

In short, the issue itself is caused by the desynchronization of login handshake packets of AutoModpack. You can find my proposed solution below.

The issue

From what I found out, AutoModpack exchanges several packets with the client in the LOGIN phase, before letting the player enter the server. In order, this goes:

=> Login handshake begins
1) HANDSHAKE SERVER -> CLIENT
2) HANDSHAKE [RESPONSE] CLIENT -> SERVER
3) LINK SERVER -> CLIENT
4) LINK [RESPONSE] CLIENT -> SERVER
5) SUCCEED or DISCONNECT player
=> Player enters server

The server will hold off completing the handshake until a) it receives a response for every login packet sent and b) all login await tasks have completed (See ServerLoginNetworkAddon#queryTick).

While AutoModpack is busy handling client login packet responses on the network thread, the channel list tracking sent packets is left empty, as the response has already been received and no other packets have been sent. Due to this, the server can proceed with logging in the player while we're still in the middle of processing the client's response, and planning to send another login packet.

Usually when connecting via IPv4, the handshake takes long enough for AutoModpack to exchange all packets without any issue. However, due to a race condition, the handshake completes significantly faster when connecting via IPv6 than it does on IPv4 connections. This doesn't leave us with enough time to process client responses, and creates the desynchronization issue. When I connected to the server using IPv6, this was the order of AutoModpack's handshake process:

=> Login handshake begins
1) HANDSHAKE SERVER -> CLIENT
2) HANDSHAKE [RESPONSE] CLIENT -> SERVER
=> End of login handshake // This happens too early, we still have a login packet to send!
3) LINK SERVER -> CLIENT
4) LINK [RESPONSE] CLIENT -> SERVER
=> Disconnected due to packet processing error

So where does the "decoder exception" error we saw some from? Login packets are identified by a number according to the order they were sent in. The HANDSHAKE packet we send is assigned id 1, whereas the LINK packet gets id 2. exception

Now, the exception shows the packet type is ServerboundChangeDifficultyPacket. What I first found confusing was that this packet is only sent if you change the difficulty in the world's settings. It is never sent on server login. So how did it end up here?
Coincidentally, the packet's ID in vanilla's packet system is also 2!

If you look at the IPv6 handshake process order above, you can see the LINK packet with id 2 is only sent AFTER the login handshake had already completed. This means that at the time the server receives the packet, the login network handler is already gone and is instead replaced by the play network handler, which processes game packets. The play handler has no knowledge of login packets and in its registry, id 2 belongs to the ServerboundChangeDifficultyPacket. It tries to process the received data and runs into an error when the size of the packet is larger than it expects.

The solution

Fabric already comes with a very handy tool that handles login packet synchronization - the LoginSynchronizer. It is available to you inside the global server login receiver. As the class documentation explains, it:

Allows blocking client log-in until all all futures passed into LoginSynchronizer#waitFor(Future) are completed.

The included example describes a similar scenario in which the server needs to send a followup packet to the client (in our case this is LINK sent after HANDSHAKE).

Therefore by using LoginSynchronizer, we can hold off the handshake process from completing until we have sent a response packet. Here you can find my modified version of the LoginS2CPacket class that I used to fix the bug. I have simply extracted all code from the else statement under if (!understood) into a separate method, then called that method from a synchronized context using loginSynchronizer.waitFor.

You may want to synchronize the receiver of LinkS2CPacket as well, although it is not necessary, as in case of failure the player will get disconnected regardless of whether the login process has completed or not.

Skidamek commented 1 year ago

Oh man! I didn't expect THAT detailed response, wow! Thank you so much for your time and effort! So cool! Do you want to make PR or should i stop wasting your time xD? How did you figure that all out? I am impressed. Also how IPv6 is that much faster that it causes issues? Thank you again! :))

Su5eD commented 1 year ago

Opened a PR on upstream with the changed from the gist. Feel free to tell me if I should redirect it to another branch instead of main.

Also how IPv6 is that much faster that it causes issues?

Not sure if it's actually "faster" in terms of performance, but it certainly does something to bring down the login handshake time.

Skidamek commented 1 year ago

Alright! Thanks again! Merged! :)