PretendoNetwork / friends

Pretendo Network friends server
GNU Affero General Public License v3.0
27 stars 7 forks source link

NintendoClients hangs sometimes when doing friend actions on Pretendo #16

Closed Preloading closed 2 months ago

Preloading commented 2 months ago

Hi there!

For the past few weeks trying to get a service that takes what you've been doing on your 3ds, turning it into your set discord activity, to work with Pretendo (3DS-RPC, https://github.com/MCMi460/3DS-RPC, my fork can be located in the PR tab). It has been going pretty well, and I did manage to finish it... mostly. Since I got to connect and mostly work, I've been presented with an issue that I have been unable to solve.

The Problem:

When you first run the program (in the service backend.py) on pretendo it might seem like it's working, until it hangs. It's still connected to pretendo, as when/if your network drops, it'll return RuntimeError: RMC connection is closed, but it doesn't do much else. It also doesn't always happen at the same spot, nor is it guarenteed to happen on a "cycle". Although, if you sit a bit of time between "commands", it can increase the chance of locking up.

My debugging process (and my loss of sanity)

I did a lot of sharing my debugging of this problem on pretendo's support channel (here and here) but i'll repeat my debugging here.

After noticing it locked up, I tried my basic using print statements to find where it fails. At that state, I noticed that it had happened a lot on 'remove_friend_by_pid' but not always. Then, thinking it might be some sorta problem with nintendoclients, and trying to get a better foothold on where exactly it's breaking, I edited the 'remove_friend_by_pid' function to print a number every line.

image

After doing this, the printing stopped at "3", which is: data = await self.client.request(self.PROTOCOL_ID, self.METHOD_REMOVE_FRIEND_BY_PRINCIPAL, stream.get()) I couldn't find where this function went in NintendoClients, so I assumed that it called some sorta python built in function at the time. Knowing that it breaks sending the data to Pretendo's servers, I decided I would whip out my wireshark setup from doing network dumps. Yup, I did a network dump of my code. (also something to note, if I write wireguard, I mean wireshark, I keep doing it) Since I was on WSL, and had nothing else running, I just installed Wireshark on WSL, and took my network dumps. To get the full traffic, I added something that would log any SSL things for wireshark to decode, and installed nex-dissector onto my wireshark install. If you want the wireshark dump, DM me on discord (@Preloading), and I might send it. Doing the wireshark dump, and a side by side of the dump and my code running, I managed to somewhat isolate where it's happening

image

I get "Destination unreachable (Port unreachable)". After asking a dev (ashquarky) for ideas on why it happens they said

so packet 83 is the server pinging you to keep the connection open
which fails
[...]
the funny part is right AFTER that, packet 87, you ping the server and it works fine
[...]
so I reckon it's some NAT holepunching

and recommended that I should ask DaniElectra. The theory about NAT things I'm pretty sure is incorrect (this was partly fuelled because I was testing it on WSL), and I got someone else with different hardware and everything to test it, and they get the same hanging result. I posted most of the information up to this point on a new discord support post. I got the following responses. image Till, in my desperate attempt to get it to work, I managed to make it work, at least that's what I thought at the time. So what did I do? Inside of the code, there are a bunch of wait statements between doing things to avoid getting banned from Nintendo Network (as I'm sure spamming Nintendo with constant requests would probably make something or someone check the traffic.) but removing these wait statements made it get further, sometimes even completing cycles, allowing me to finish the code adding Pretendo support to the app. But this didn't quite solve it. If there are a lot of things it needs to do, it locks up again, almost as if I have a maximum time to do everything I need, before it breaks.

This is what I've gotten, and when did i start writing a book? I've been stuck on this problem for a while, and I would like to say that my implementation of Pretendo on that service is done, but I can't really until it's fixed. I'm pretty sure it's a pretendo problem. I think that's all i have to say in this huge bug report.

Thanks for any help!

PS. Thank you all for working on Pretendo, and giving my favourite game console, a new life.

spotlightishere commented 2 months ago

Hey y'all, I'm also working on 3DS-RPC and wanted to chime in with some observations. I'm able to easily reproduce the hanging issue, but unfortunately cannot reproduce any issues related to anything icmp/unreachable behind NAT or if directly exposed - everything appears to deliver okay per tcpdump/Wireshark.

However, it definitely feels as if something's timing out regardless - this behavior is consistently reproducible after approximately 5/6 seconds. (An expectation of 5 seconds was also mentioned in the Pretendo Network server - might this be the same thing occurring here?)

This time-based limitation appears to be fairly consistent. I wrote a script to repeatedly call get_all_friends and ran it across several servers in different geographic locations. When turning on debug logging, the following was generally shown from NintendoClients:

# connection was at 2024-05-05 00:52:06

# Successful 
2024-05-05 00:52:11 INFO     FriendsClientV1.get_all_friends()
2024-05-05 00:52:11 DEBUG    [CLI] Sending packet: <PRUDPPacket type=TYPE_DATA flags=NEED_ACK,RELIABLE,HAS_SIZE seq=5 frag=0>
2024-05-05 00:52:11 DEBUG    [CLI] Sending packet: <PRUDPPacket type=TYPE_PING flags=NEED_ACK,RELIABLE seq=3 frag=0>
2024-05-05 00:52:11 DEBUG    [CLI] Received packet: <PRUDPPacket type=TYPE_DATA flags=ACK seq=5 frag=0>
2024-05-05 00:52:11 DEBUG    [CLI] Received packet: <PRUDPPacket type=TYPE_PING flags=ACK seq=3 frag=0>
2024-05-05 00:52:11 DEBUG    [CLI] Received packet: <PRUDPPacket type=TYPE_DATA flags=NEED_ACK,RELIABLE seq=4 frag=0>
2024-05-05 00:52:11 DEBUG    [CLI] Sending packet: <PRUDPPacket type=TYPE_DATA flags=ACK seq=4 frag=0>
2024-05-05 00:52:11 DEBUG    Received RMC response: protocol=101 method=15 call=4
2024-05-05 00:52:11 INFO     FriendsClientV1.get_all_friends -> done

# Hangs
2024-05-05 00:52:12 INFO     FriendsClientV1.get_all_friends()
2024-05-05 00:52:12 DEBUG    [CLI] Sending packet: <PRUDPPacket type=TYPE_DATA flags=NEED_ACK,RELIABLE,HAS_SIZE seq=7 frag=0>
2024-05-05 00:52:12 DEBUG    [CLI] Received packet: <PRUDPPacket type=TYPE_DATA flags=ACK seq=7 frag=0>

Within my (very limited) observations it appears that once in this hanging state, the server never sends the second packet with its response after the DATA ACK. Nothing else appears to be sent from the server beyond responses to the client-initiated pings.

Preloading commented 2 months ago

In some desperation, I got it to work by adding s["prudp.ping_timeout"] = 100000000000 This should not be the final solution, but it's an alright workaround for now before it gets properly fixed.

Edit: Nevermind... I got really (un)lucky with my script I guess

Edit 2: The fix is working again???? what???