TimonPost / laminar

A simple semi-reliable UDP protocol for multiplayer games
830 stars 66 forks source link

Packet Packet duplications #303

Open DMClVG opened 2 years ago

DMClVG commented 2 years ago

I am using Laminar 0.5.0 in conjunction with Bevy 0.5.0 through my own systems (I'm not using a library such as the outdated bevy_prototype_network_laminar) and I'm left in a bit of a pickle.

Essentialy what I have is a client that sends packets and heartbeats every so often to the server as well as the other way around, both of which are made with laminar + bevy and run on the same machine. But sometimes the clients sends in many packets at once (around a hundred), and when it does, the server recieves those packets normally, but after doing so, all the subsequent heartbeats responses from the server somehow magically make more packets appear out of nowhere. Please take a look at this

(excuse me for my mousewriting)

On the client: image Total packets sent: 100

On the server: image Total packets received: 208

As you can see, everytime the server sends a hearbeat response back to the client, a portion of these packets reappear on the server even after the whole batch had been received. I have checked these packets and they do deserialize correctly into their structs so corruption isn't an issue. And I don't see a send message anywhere on the client.

If I do the same but for only send 30 packets: image Everything suddenly looks okay

Also, this doesn't happen if these packets are sent reliable ordered, so perhaps its only an issue with reliable unordered packets, but then it introduces another problem because these packets (if its a really high number like 400) are sort of "synchronized" to the heartbeats like this:

image The total number of received packets is correct but the second batch here arrives very late (a heartbeat lasts 2 seconds) and from my observations it happens right after some heartbeat response, not any earlier or later. Is this expected behaviour? Or should I open another issue for this?

I have thought of using enet or another library for this, but changing all of this code would be fustrating. Any idea why all of this happens?

TimonPost commented 2 years ago

I don't have a clear indication of why this could happen. I would have to debug this issue.

A couple of points;

DMClVG commented 2 years ago

So is the 32 packet limit a design choice or is there an option to change it? I understand that its not whats causing the issue, but from what you're implying is that on some cases, some packets, even reliable ones, aren't guarranteed to arrive due to it. Regardless, I'll be content with just this getting resolved.

In case you've missed it in my description, I have tried out reliable ordered and it does work with some small issues that are most likely also related with the issue at hand.

I have also made a much smaller example that replicates the problem if that helps. Here you go

// Sender
use laminar::{Socket, Packet, ErrorKind}; // v0.5.0

fn main() -> Result<(), ErrorKind> {
   // create and open socket
    let mut socket = Socket::bind_any().unwrap();
    let packet_sender = socket.get_packet_sender();

    let _thread = std::thread::spawn(move || socket.start_polling());

   // initialize example packet
    let reliable = Packet::reliable_unordered("127.0.0.1:4004".parse().unwrap(), vec![0, 2, 213, 88, 21]);

    for _ in 0..100 { // send reliable unordered packet 100 times
        packet_sender.send(reliable.clone()).unwrap();
    }

    loop {}
}
// Receiver
use laminar::{SocketEvent, Socket, ErrorKind, Packet}; // v0.5.0

fn main() -> Result<(), ErrorKind> {
    // bind socket to port 4004
    let mut socket = Socket::bind("127.0.0.1:4004")?;

    let event_receiver = socket.get_event_receiver();
    let packet_sender = socket.get_packet_sender();

    let _thread = std::thread::spawn(move || socket.start_polling());

   // for each received packet, send out a response, and print out the total n of received packets
    let mut n = 0;
    loop {
        match event_receiver.recv() {
            Ok(socket_event) => {
                match socket_event {
                    SocketEvent::Packet(packet) => {

                        packet_sender.send(Packet::reliable_sequenced(
                            packet.addr(),
                            vec![0, 0, 3],
                            None,
                        )).unwrap(); // sending out a reponse is key to replicating this bug

                        n += 1;
                    }
                    _ => {}
                }
            }
            Err(e) => {
                eprintln!("Something went wrong when receiving, error: {:?}", e);
            }
        }
        println!("{}", n);
    }
}

Thats the n I get on my side: image

If there's something more you need I'll be glad to help out!

TimonPost commented 2 years ago

Yes it is a design choice. Laminar works with the method described by gaffer:

_If bit n is set in ack_bits, then ack - n is acked. Not only is ackbits a smart encoding that saves bandwidth, it also adds redundancy to combat packet loss. Each ack is sent 32 times. If one packet is lost, there’s 31 other packets with the same ack. Statistically speaking, acks are very likely to get through.

So its not perfect reliable but almost perfect reliable.

Thanks for the provided example, that will help a lot, when I have some time I will try to debug this problem.

TimonPost commented 2 years ago

Okay whilst debugging some memories arose from the past, so this issue came up in the past. It's important to first do a basic handshake and then start to send data. You can read this in the book

Further, I just added a thread::sleep with a 2-5 millis delay when sending, this already got rid of your bug. Can you perhaps try this?

image

related issues: https://github.com/TimonPost/laminar/issues/134 https://github.com/TimonPost/laminar/issues/222

DMClVG commented 2 years ago

Sorry I should have clarified that in my example, but I omitted the handshake process for the sake of simplicity, but in my real program I do establish a connection first before ever sending any further packets. Now to your suggestion: i did try it out and it does get rid of that bug, but it doesn't make much sense does it? As far as I'm aware, that packet_sender is simply a channel that sends the packet to the pooling thread, so why does slowing it down change something with how start_polling processes these packets?

But what really struck me is that in the book there, it says that heartbeats packets have to be send regularly and very frequently for the network to stay healthy, whereas my program I only send heartbeats every 2 seconds and even for the player I only ever send positions whenever the player moves. Idk I don't really understand anything about how it works but maybe that's the reason for it

Calandiel commented 2 years ago

Yes it is a design choice. Laminar works with the method described by gaffer:

_If bit n is set in ack_bits, then ack - n is acked. Not only is ackbits a smart encoding that saves bandwidth, it also adds redundancy to combat packet loss. Each ack is sent 32 times. If one packet is lost, there’s 31 other packets with the same ack. Statistically speaking, acks are very likely to get through.

So its not perfect reliable but almost perfect reliable.

Thanks for the provided example, that will help a lot, when I have some time I will try to debug this problem.

I feel like this is a bit of a misunderstanding of Gaffers suggested approach, no? What follows that quoted part, pretty much immediately, is this:

But bursts of packet loss do happen, so it’s important to note that:

    If you receive an ack for packet n then that packet was definitely received.

    If you don’t receive an ack, the packet was most likely not received. But, it might have been, and the ack just didn’t get through. This is extremely rare.

In my experience it’s not necessary to send perfect acks. Building a reliability system on top of a system that very rarely drops acks adds no significant problems. But it does create a challenge for testing this system works under all situations because of the edge cases when acks are dropped.

So please if you do implement this system yourself, setup a soak test with terrible network conditions to make sure your ack system is working correctly. 

I believe the intention was to actually make the system reliable, but left as an exercise for the reader, so to speak. And it is a rather simple exercise -- one could return some kind of event from the server and let the application decide what to do or resent un-acknowledged packets automatically.