PretendoNetwork / nex-go

Barebones PRUDP/NEX server library written in Go
GNU Affero General Public License v3.0
74 stars 16 forks source link

Make reliable ping packets update sliding window #58

Closed wolfendale closed 6 months ago

wolfendale commented 6 months ago

Resolves some of the issues in: #54, I believe this also fixes PretendoNetwork/friends#16

Changes:

NintendoClients does a couple of interesting things with ping packets:

Currently, we assume that all ping packets will be unreliable and so don't increment the expected sequence id from an incoming ping packet. This results in the sliding window of that substream waiting forever for a packet that will never come. This doesn't close the connection as we do acknowledge the ping, but means that any requests waiting for a response on that sliding window will never receive them

The fix updates the ping handler to check if a ping packet is reliable, and if so it increments the expected sequence id counter for that substream

Extra elaboration on PretendoNetwork/friends#16

In some of the images shared you can see the ping packet in WireShark before issues with the connection.

Also in the issue it's mentioned that things work for about 5/6 seconds (which is the default ping timeout), and setting the pingtimeout to a large number is a workaround

As stated above, nintendoclients sends ping packets even if a connection is in use, so even if it's used constantly it'll eventually see this issue

jonbarrow commented 6 months ago

lgtm

jonbarrow commented 6 months ago

Actually @wolfendale pointed out something in DMs I missed. Calling Update on the SlidingWindow will process the pending packets as well and return them, which could be an issue if we get a packet out of order and then get a PING packet. Any pending packets would be lost.

We have a few options for this:

  1. Export the counter and update it manually
  2. Add a new method to SlidingWindow separate from Update which only handles the increments and does not return. Perhaps Update and UpdateAndFlush? Though this starts to break the QRV API
  3. Add a new argument to Update which tells the method whether or not to flush the pending packets

@DaniElectra what do you think

wolfendale commented 6 months ago

Another option for this is potentially to change SlidingWindow to do the processing from handleReliable internally, this would also remove the need for exposing the Lock and Unlock methods I think. Basically, Update would internally do the decryption/decompression and update the internal fragment. It also wouldn't need to return the packets, could just return the buffer and an indication of whether the payload is complete?

I can knock something like this up as an example if you like, but it'll be in the morning as it's midnight now

jonbarrow commented 6 months ago

The job of SlidingWindow is just to manage the order of the packets and ensure they are delivered/handled in the correct order, I think it's important to keep that separation of concerns.

However I was incorrect with my 2nd point. Adding a new method like that wouldn't break the API any more than we already have. I just checked Xenoblade again to see if there's an applicable method already in SlidingWindow and there isn't even a Update method, that was likely carried over from using Kinnay's client as a reference.

The official implementation of SlidingWindow, as seen in Xenoblade, has the following methods:

This seems pretty client-centric, but we can likely work with this tbh

jonbarrow commented 6 months ago

I've checked WATCH_DOGS just now, since SlidingWindow does seem to be very client-centric (or at least, sending-centric), and may have found a better class to implement instead for this use case.

PRUDPEndPoint has rdv::PRUDPEndPoint::ServiceIncomingPacket which takes in an instance of PacketIn. Internally this uses an instance of rdv::PacketDispatchQueue:

Checking Xenoblade to look at the implementations (since NEX is typically a bit cleaner/simplified), it looks like this does what we're looking for. The PacketDispatchQueue takes in instances of PacketIn and adds them to an internal map using the Queue method. The key of this map is the packet sequence ID. The PacketDispatchQueue also seems to maintain a reference for the next expected sequence ID and returns it via GetNextToDispatch.

nn::nex::PRUDPEndPoint::ServiceIncomingPacket checks the packets substream ID and uses that to pull an instance of PacketDispatchQueue from an internal list if FLAG_RELIABLE is set and then adds the packet to the queue (trimmed and modified for clarity):

if ((packet->type_flags & 0x20) != 0) { // Has FLAG_RELIABLE
    dispatch_queue = (nn::nex::PacketDispatchQueue **)__CPR230____vc__Q2_3std112vector__tm__98_PQ3_2nn3nex19PacketDispatchQueueQ3_2nn3nex53MemAllocator__tm__33_PQ3_2nn3nexJ42JFQ4_3std25_Vector_val__tm__7_Z1ZZ2Z5_Alty9size_type_QJ130JdJ136JJ163J9reference(&this->field_0x64, packet->substream_id);

    Queue__Q3_2nn3nex19PacketDispatchQueueFPQ3_2nn3nex8PacketIn(*dispatch_queue,packet);
}

Then at the very end of the function it calls nn::nex::PacketDispatchQueue::GetNextToDispatch, nn::nex::PRUDPEndPoint::Dispatch (if the packets type is DATA), and nn::nex::PacketDispatchQueue::Dispatched all in a loop so long as there's a packet to dispatch (trimmed and modified for clarity):

dispatch_queue = (nn::nex::PacketDispatchQueue **)__CPR230____vc__Q2_3std112vector__tm__98_PQ3_2nn3nex19PacketDispatchQueueQ3_2nn3nex53MemAllocator__tm__33_PQ3_2nn3nexJ42JFQ4_3std25_Vector_val__tm__7_Z1ZZ2Z5_Alty9size_type_QJ130JdJ136JJ163J9reference(&this->field_0x64, packet->substream_id);
next_packet = GetNextToDispatch__Q3_2nn3nex19PacketDispatchQueueFv(*dispatch_queue);

while (next_packet != (nn::nex::PacketIn *)0x0) {
    if ((next_packet->type_flags & 0xF) == 2) { // DATA packet
        local_90 = *(undefined4 *)&packet[1].field_0x18;
        Dispatch__Q3_2nn3nex13PRUDPEndPointFPQ3_2nn3nex8PacketInQ3_2nn3nex4Time(this, next_packet, &local_90);
    }

    dispatch_queue2 = (undefined4 *)__CPR230____vc__Q2_3std112vector__tm__98_PQ3_2nn3nex19PacketDispatchQueueQ3_2nn3nex53MemAllocator__tm__33_PQ3_2nn3nexJ42JFQ4_3std25_Vector_val__tm__7_Z1ZZ2Z5_Alty9size_type_QJ130JdJ136JJ163J9reference(&this->field_0x64, packet->substream_id);
    Dispatched__Q3_2nn3nex19PacketDispatchQueueFPQ3_2nn3nex8PacketIn(*dispatch_queue2, next_packet);

    dispatch_queue = (nn::nex::PacketDispatchQueue **)__CPR230____vc__Q2_3std112vector__tm__98_PQ3_2nn3nex19PacketDispatchQueueQ3_2nn3nex53MemAllocator__tm__33_PQ3_2nn3nexJ42JFQ4_3std25_Vector_val__tm__7_Z1ZZ2Z5_Alty9size_type_QJ130JdJ136JJ163J9reference(&this->field_0x64, packet->substream_id);
    next_packet = GetNextToDispatch__Q3_2nn3nex19PacketDispatchQueueFv(*dispatch_queue);
}

return;

nn::nex::PRUDPEndPoint::Dispatch is where the actual handling of the incoming packet happens. It calls nn::nex::PacketEncDec::Decode on the packet (which decrypts the payload), and then checks the packet flags. If FLAG_RELIABLE then calls nn::nex::PRUDPEndPoint::Defrag to handle fragmentation. If Defrag returns then it calls nn::nex::PRUDPEndPoint::HandlerDispatch and increments some stuff. If the packet is unreliable then it gets some scheduler instance and checks if the packet can be processed or not, and then if so calls nn::nex::PRUDPEndPoint::HandlerDispatch as well.

It seems like PacketDispatchQueue is what we should be implementing here, not SlidingWindow, when it comes to INCOMING packets. For OUTGOING packets we should continue to use SlidingWindow

CC @DaniElectra for thoughts

jonbarrow commented 6 months ago

A super basic example would look like this:

package main

import "fmt"

type PacketDispatchQueue struct {
    queue map[int]Packet
}

func (pdq *PacketDispatchQueue) Queue(packet Packet) {
    pdq.queue[packet.sequenceID] = packet
}

func (pdq *PacketDispatchQueue) GetNextToDispatch() <-chan Packet {
    ch := make(chan Packet)

    go func() {
        defer close(ch)
        for _, packet := range pdq.queue {
            ch <- packet
        }
    }()

    return ch
}

func (pdq *PacketDispatchQueue) Dispatched(packet Packet) {
    delete(pdq.queue, packet.sequenceID)
}

type Packet struct {
    sequenceID int
}

func main() {
    queue := PacketDispatchQueue{queue: make(map[int]Packet)}

    fmt.Printf("%+v\n", queue) // {queue:map[]}

    queue.Queue(Packet{sequenceID: 1})
    queue.Queue(Packet{sequenceID: 2})
    queue.Queue(Packet{sequenceID: 4})
    queue.Queue(Packet{sequenceID: 8})

    fmt.Printf("%+v\n", queue) // {queue:map[1:{sequenceID:1} 2:{sequenceID:2} 4:{sequenceID:4} 8:{sequenceID:8}]}

    for packet := range queue.GetNextToDispatch() {
        fmt.Printf("%+v\n", packet) // {sequenceID:NUM}

        queue.Dispatched(packet)
    }

    fmt.Printf("%+v\n", queue) // {queue:map[]}
}

Though this has concurrency issues as-is. fmt.Printf("%+v\n", packet) will print different packet orders each time, and it has race conditions for if something is added to the queue when the queue is being processed. But it's a step in the right direction I feel

jonbarrow commented 6 months ago

Also while doing the RE, we also handle fragmentation buffers differently than NEX/QRV. We handle the fragmentation buffers inside SlidingWindow, but NEX/QRV stores these on the PRUDPEndPoint itself using a list of buffers indexed based on the substream ID

DaniElectra commented 6 months ago

It seems like PacketDispatchQueue is what we should be implementing here, not SlidingWindow, when it comes to INCOMING packets. For OUTGOING packets we should continue to use SlidingWindow

That sounds good to me

We handle the fragmentation buffers inside SlidingWindow, but NEX/QRV stores these on the PRUDPEndPoint itself using a list of buffers indexed based on the substream ID

Do we know if the rdv::Timeout and rdv::TimeoutManager are also stored on the PRUDPEndPoint? We may want to move it there too so that we don't need to keep ResendScheduler inside the SlidingWindow

jonbarrow commented 6 months ago

Do we know if the rdv::Timeout and rdv::TimeoutManager are also stored on the PRUDPEndPoint? We may want to move it there too so that we don't need to keep ResendScheduler inside the SlidingWindow

I don't remember off the top of my head, no, and I won't be home for the rest of the day to check. I can poke around when I am home though and see if I can figure that out, if no one else beats me to it

jonbarrow commented 6 months ago

@DaniElectra I thought I wasn't going to be home again, but I was able to stop back by my apartment for a bit before actually heading out for the rest of the day. I took a quick peak at Xenoblade and found this:

So looks like the timeouts themselves are stored on the PacketOut and the manager is stored on the PRUDPStream, which itself manages several PRUDPEndPoints.

I guess since we only ever use a single PRUDPEndPoint then we can just store it there?

And with this, I am officially gone for the day :+1:

wolfendale commented 6 months ago

Closing this in favour of #59 as it fixes the same bug