Make reliable ping packets update sliding window

wolfendale commented 6 months ago

Resolves some of the issues in: #54, I believe this also fixes PretendoNetwork/friends#16

Changes:

NintendoClients does a couple of interesting things with ping packets:

It continues to send them even if the connection is actively being used
It sets the RELIABLE flag on ping packets

Currently, we assume that all ping packets will be unreliable and so don't increment the expected sequence id from an incoming ping packet. This results in the sliding window of that substream waiting forever for a packet that will never come. This doesn't close the connection as we do acknowledge the ping, but means that any requests waiting for a response on that sliding window will never receive them

The fix updates the ping handler to check if a ping packet is reliable, and if so it increments the expected sequence id counter for that substream

Extra elaboration on PretendoNetwork/friends#16

In some of the images shared you can see the ping packet in WireShark before issues with the connection.

Also in the issue it's mentioned that things work for about 5/6 seconds (which is the default ping timeout), and setting the pingtimeout to a large number is a workaround

As stated above, nintendoclients sends ping packets even if a connection is in use, so even if it's used constantly it'll eventually see this issue

[x] I have read and agreed to the Code of Conduct.
[x] I have read and complied with the contributing guidelines.
[x] What I'm implementing was an approved issue.
[x] I have tested all of my changes.

jonbarrow commented 6 months ago

lgtm

jonbarrow commented 6 months ago

Actually @wolfendale pointed out something in DMs I missed. Calling Update on the SlidingWindow will process the pending packets as well and return them, which could be an issue if we get a packet out of order and then get a PING packet. Any pending packets would be lost.

We have a few options for this:

Export the counter and update it manually
Add a new method to SlidingWindow separate from Update which only handles the increments and does not return. Perhaps Update and UpdateAndFlush? Though this starts to break the QRV API
Add a new argument to Update which tells the method whether or not to flush the pending packets

@DaniElectra what do you think

wolfendale commented 6 months ago

Another option for this is potentially to change SlidingWindow to do the processing from handleReliable internally, this would also remove the need for exposing the Lock and Unlock methods I think. Basically, Update would internally do the decryption/decompression and update the internal fragment. It also wouldn't need to return the packets, could just return the buffer and an indication of whether the payload is complete?

I can knock something like this up as an example if you like, but it'll be in the morning as it's midnight now

jonbarrow commented 6 months ago

The job of SlidingWindow is just to manage the order of the packets and ensure they are delivered/handled in the correct order, I think it's important to keep that separation of concerns.

However I was incorrect with my 2nd point. Adding a new method like that wouldn't break the API any more than we already have. I just checked Xenoblade again to see if there's an applicable method already in SlidingWindow and there isn't even a Update method, that was likely carried over from using Kinnay's client as a reference.

The official implementation of SlidingWindow, as seen in Xenoblade, has the following methods:

nn::nex::SlidingWindow::Acknowledged(uint16, qMap) - Takes in a sequence ID and a map. Not sure what the map is. Returns nothing
nn::nex::SlidingWindow::AcquireIterator() - Returns an iterator over a map of PacketOut structs where the map key is the sequence ID (Could be the same map used in Acknowledged? But I find that unlikely). References nn::nex::Network::GetLock internally
nn::nex::SlidingWindow::GetFreeBufferNum() - Idk
nn::nex::SlidingWindow::GetNextToSend() - Obvious
nn::nex::SlidingWindow::GetPacket(uint16) - Takes in a sequence ID and returns a pointer to the packet
nn::nex::SlidingWindow::GetPacket(interator) - Takes in an instance of the Iterator from AcquireIterator and seems to return the next packet?
nn::nex::SlidingWindow::NbDataPending() - Returns the number of pending packets. Name implies only DATA packets?
nn::nex::SlidingWindow::DataPending() - Returns a bool determining if there's pending packets. Name implies only DATA packets?
nn::nex::SlidingWindow::Purge() - Cleanup
nn::nex::SlidingWindow::Push(PacketOut*, PacketEncDec*, Stream*) - Pushes a packet into the pending map to be sent
nn::nex::SlidingWindow::ReleaseIterator() - Releases the iterator. References nn::nex::Network::GetLock internally
nn::nex::SlidingWindow::Trace() - Stubbed? Empty function

This seems pretty client-centric, but we can likely work with this tbh

jonbarrow commented 6 months ago

I've checked WATCH_DOGS just now, since SlidingWindow does seem to be very client-centric (or at least, sending-centric), and may have found a better class to implement instead for this use case.

PRUDPEndPoint has rdv::PRUDPEndPoint::ServiceIncomingPacket which takes in an instance of PacketIn. Internally this uses an instance of rdv::PacketDispatchQueue:

rdv::PacketDispatchQueue::Queue(PacketIn) - Adds a packet to the dispatch queue
rdv::PacketDispatchQueue::GetNextToDispatch() - Returns the next packet to process
rdv::PacketDispatchQueue::Dispatched(PacketIn) - Marks a packet as dispatched
rdv::PacketDispatchQueue::Purge() - cleanup

Checking Xenoblade to look at the implementations (since NEX is typically a bit cleaner/simplified), it looks like this does what we're looking for. The PacketDispatchQueue takes in instances of PacketIn and adds them to an internal map using the Queue method. The key of this map is the packet sequence ID. The PacketDispatchQueue also seems to maintain a reference for the next expected sequence ID and returns it via GetNextToDispatch.

nn::nex::PRUDPEndPoint::ServiceIncomingPacket checks the packets substream ID and uses that to pull an instance of PacketDispatchQueue from an internal list if FLAG_RELIABLE is set and then adds the packet to the queue (trimmed and modified for clarity):

if ((packet->type_flags & 0x20) != 0) { // Has FLAG_RELIABLE
    dispatch_queue = (nn::nex::PacketDispatchQueue **)__CPR230____vc__Q2_3std112vector__tm__98_PQ3_2nn3nex19PacketDispatchQueueQ3_2nn3nex53MemAllocator__tm__33_PQ3_2nn3nexJ42JFQ4_3std25_Vector_val__tm__7_Z1ZZ2Z5_Alty9size_type_QJ130JdJ136JJ163J9reference(&this->field_0x64, packet->substream_id);

    Queue__Q3_2nn3nex19PacketDispatchQueueFPQ3_2nn3nex8PacketIn(*dispatch_queue,packet);
}

Then at the very end of the function it calls nn::nex::PacketDispatchQueue::GetNextToDispatch, nn::nex::PRUDPEndPoint::Dispatch (if the packets type is DATA), and nn::nex::PacketDispatchQueue::Dispatched all in a loop so long as there's a packet to dispatch (trimmed and modified for clarity):

dispatch_queue = (nn::nex::PacketDispatchQueue **)__CPR230____vc__Q2_3std112vector__tm__98_PQ3_2nn3nex19PacketDispatchQueueQ3_2nn3nex53MemAllocator__tm__33_PQ3_2nn3nexJ42JFQ4_3std25_Vector_val__tm__7_Z1ZZ2Z5_Alty9size_type_QJ130JdJ136JJ163J9reference(&this->field_0x64, packet->substream_id);
next_packet = GetNextToDispatch__Q3_2nn3nex19PacketDispatchQueueFv(*dispatch_queue);

while (next_packet != (nn::nex::PacketIn *)0x0) {
    if ((next_packet->type_flags & 0xF) == 2) { // DATA packet
        local_90 = *(undefined4 *)&packet[1].field_0x18;
        Dispatch__Q3_2nn3nex13PRUDPEndPointFPQ3_2nn3nex8PacketInQ3_2nn3nex4Time(this, next_packet, &local_90);
    }

    dispatch_queue2 = (undefined4 *)__CPR230____vc__Q2_3std112vector__tm__98_PQ3_2nn3nex19PacketDispatchQueueQ3_2nn3nex53MemAllocator__tm__33_PQ3_2nn3nexJ42JFQ4_3std25_Vector_val__tm__7_Z1ZZ2Z5_Alty9size_type_QJ130JdJ136JJ163J9reference(&this->field_0x64, packet->substream_id);
    Dispatched__Q3_2nn3nex19PacketDispatchQueueFPQ3_2nn3nex8PacketIn(*dispatch_queue2, next_packet);

    dispatch_queue = (nn::nex::PacketDispatchQueue **)__CPR230____vc__Q2_3std112vector__tm__98_PQ3_2nn3nex19PacketDispatchQueueQ3_2nn3nex53MemAllocator__tm__33_PQ3_2nn3nexJ42JFQ4_3std25_Vector_val__tm__7_Z1ZZ2Z5_Alty9size_type_QJ130JdJ136JJ163J9reference(&this->field_0x64, packet->substream_id);
    next_packet = GetNextToDispatch__Q3_2nn3nex19PacketDispatchQueueFv(*dispatch_queue);
}

return;

nn::nex::PRUDPEndPoint::Dispatch is where the actual handling of the incoming packet happens. It calls nn::nex::PacketEncDec::Decode on the packet (which decrypts the payload), and then checks the packet flags. If FLAG_RELIABLE then calls nn::nex::PRUDPEndPoint::Defrag to handle fragmentation. If Defrag returns then it calls nn::nex::PRUDPEndPoint::HandlerDispatch and increments some stuff. If the packet is unreliable then it gets some scheduler instance and checks if the packet can be processed or not, and then if so calls nn::nex::PRUDPEndPoint::HandlerDispatch as well.

It seems like PacketDispatchQueue is what we should be implementing here, not SlidingWindow, when it comes to INCOMING packets. For OUTGOING packets we should continue to use SlidingWindow

CC @DaniElectra for thoughts

jonbarrow commented 6 months ago

A super basic example would look like this:

package main

import "fmt"

type PacketDispatchQueue struct {
    queue map[int]Packet
}

func (pdq *PacketDispatchQueue) Queue(packet Packet) {
    pdq.queue[packet.sequenceID] = packet
}

func (pdq *PacketDispatchQueue) GetNextToDispatch() <-chan Packet {
    ch := make(chan Packet)

    go func() {
        defer close(ch)
        for _, packet := range pdq.queue {
            ch <- packet
        }
    }()

    return ch
}

func (pdq *PacketDispatchQueue) Dispatched(packet Packet) {
    delete(pdq.queue, packet.sequenceID)
}

type Packet struct {
    sequenceID int
}

func main() {
    queue := PacketDispatchQueue{queue: make(map[int]Packet)}

    fmt.Printf("%+v\n", queue) // {queue:map[]}

    queue.Queue(Packet{sequenceID: 1})
    queue.Queue(Packet{sequenceID: 2})
    queue.Queue(Packet{sequenceID: 4})
    queue.Queue(Packet{sequenceID: 8})

    fmt.Printf("%+v\n", queue) // {queue:map[1:{sequenceID:1} 2:{sequenceID:2} 4:{sequenceID:4} 8:{sequenceID:8}]}

    for packet := range queue.GetNextToDispatch() {
        fmt.Printf("%+v\n", packet) // {sequenceID:NUM}

        queue.Dispatched(packet)
    }

    fmt.Printf("%+v\n", queue) // {queue:map[]}
}

Though this has concurrency issues as-is. fmt.Printf("%+v\n", packet) will print different packet orders each time, and it has race conditions for if something is added to the queue when the queue is being processed. But it's a step in the right direction I feel

jonbarrow commented 6 months ago

Also while doing the RE, we also handle fragmentation buffers differently than NEX/QRV. We handle the fragmentation buffers inside SlidingWindow, but NEX/QRV stores these on the PRUDPEndPoint itself using a list of buffers indexed based on the substream ID

DaniElectra commented 6 months ago

It seems like PacketDispatchQueue is what we should be implementing here, not SlidingWindow, when it comes to INCOMING packets. For OUTGOING packets we should continue to use SlidingWindow

That sounds good to me

We handle the fragmentation buffers inside SlidingWindow, but NEX/QRV stores these on the PRUDPEndPoint itself using a list of buffers indexed based on the substream ID

Do we know if the rdv::Timeout and rdv::TimeoutManager are also stored on the PRUDPEndPoint? We may want to move it there too so that we don't need to keep ResendScheduler inside the SlidingWindow

jonbarrow commented 6 months ago

Do we know if the rdv::Timeout and rdv::TimeoutManager are also stored on the PRUDPEndPoint? We may want to move it there too so that we don't need to keep ResendScheduler inside the SlidingWindow

I don't remember off the top of my head, no, and I won't be home for the rest of the day to check. I can poke around when I am home though and see if I can figure that out, if no one else beats me to it

jonbarrow commented 6 months ago

@DaniElectra I thought I wasn't going to be home again, but I was able to stop back by my apartment for a bit before actually heading out for the rest of the day. I took a quick peak at Xenoblade and found this:

nn::nex::TimeoutManager lives inside of nn::nex::PRUDPStream (nn::nex::SecureStream also exists but it doesn't seem to have these fields?)
PRUDPStream manages a list of nn::nex::PRUDPEndPoints
nn::nex::PRUDPEndPoint::SendReliable takes in a PacketOut and then
- Checks if it's a DISCONNECT packet and if some field on the endpoint is not 0xffffffff
- If the check is true then
- Pulls SlidingWindow for substream 0 (looks hard coded? Likely different in QRV)
- Checks if any packets are still pending
- If so, schedule a timeout using the timeout manager on the PRUDP stream and return. This seems to ensure that the client ALWAYS sends all of it's packets before closing a connection
- If the check is false then
- Increases the resend counter on the packet
- Does some unknown operations on the payload size?
- Calls nn::nex::PRUDPEndPoint::ComputeRetransmitTimeout on the packet to get the retransmit time
- Calls nn::nex::Timeout::SetRTO on the packets timeout field with that retransmit time
- calls nn::nex::TimeoutManager::SchedulePacketTimeout on the PRUDP streams timeout manager with the packet
- Gets some global variable?
- Gets the packets stream ID and payload crypto key
- Sends the packet using nn::nex::PRUDPStream::Send with the PRUDP stream, some field from the PRUDPEndPoint, the stream ID, the packet, and the crypto key

So looks like the timeouts themselves are stored on the PacketOut and the manager is stored on the PRUDPStream, which itself manages several PRUDPEndPoints.

I guess since we only ever use a single PRUDPEndPoint then we can just store it there?

And with this, I am officially gone for the day :+1:

wolfendale commented 6 months ago

Closing this in favour of #59 as it fixes the same bug

PretendoNetwork / nex-go

Make reliable ping packets update sliding window #58

Changes:

Extra elaboration on PretendoNetwork/friends#16