PretendoNetwork / nex-go

Barebones PRUDP/NEX server library written in Go
GNU Affero General Public License v3.0
74 stars 16 forks source link

Add locking to SlidingWindow #51

Closed wolfendale closed 6 months ago

wolfendale commented 6 months ago

Resolves problem discussed in #48

Changes:

As discussed in #48 there is an intermittent issue which was causing messages to go unprocessed. This manifested in a few ways:

  1. A SlidingWindow instance would have extra payloads added to its buffer, which would cause RMC decoding to fail, resulting in an unprocessed message
  2. A SlidingWindow instance could end up with its cipher stream in a bad state because of multiple packets being added to it. In this case the rest of the messages in that connection would become corrupted as the client and server ciphers are out of sync

This fix adds a sync.Mutex to SlidingWindow and methods to lock and unlock the SlidingWindow instance. These are then used to lock the connection's sliding window for the duration of a call to handleReliable

This is crude but works as far as I can tell. I tested this with a simple RMC server setup and automated thousands of requests against it. Because we're now locking where we weren't before in the testing I did there was a performance difference, each run of 10k sequential requests (acks for each request were awaited before the next request was sent) took 16.5~ seconds on my machine. With the change this went up to about 16.8~ on average

I definitely don't think this is a perfect solution, or the cleanest but is probably an improvement on what is there

Happy for this to just be used as the basis for something much better!

jonbarrow commented 6 months ago

Good finds! I do have to wonder why these issues don't crop up unless the connection state check is removed though? The issues described here, and the fix, seem pretty disconnected from the apparent cause.

wolfendale commented 6 months ago

Good finds! I do have to wonder why these issues don't crop up unless the connection state check is removed though? The issues described here, and the fix, seem pretty disconnected from the apparent cause.

So I'm glad you mentioned this because you're right - I didn't actually do any testing with the state checks because I was searching for this particular bug, so I put them back in and did more testing.

When I put them in without this fix, I still saw these errors. And this fix still works with the connection state checks in place 👍