cBournhonesque / lightyear

A networking library to make multiplayer games for the Bevy game engine
https://cbournhonesque.github.io/lightyear/book
Apache License 2.0
290 stars 28 forks source link

Improve benchmark performance #421

Open cBournhonesque opened 3 weeks ago

cBournhonesque commented 3 weeks ago

Benchmarks show that it takes 1.3 ms to replicate 1000 entities (replicon takes 30us). Why?

With a lot of tracing spans, it's 3ms (because of the tracing overhead):

Also here are the ChannelSendStats:

ChannelSendStats {
        num_single_messages_sent: 1000,
        num_fragment_messages_sent: 0,
        num_bytes_sent: 27000,
},

Potential ideas:

Nul-led commented 3 weeks ago

Did you actually run their benchmarks or did you just trust their results? :p

cBournhonesque commented 3 weeks ago

Ran their benchmarks :) I'm trying to understand why the difference can be so big. Probably because of extra allocations? but still

cBournhonesque commented 3 weeks ago

First optimization I will try:

Note that these new approaches also probably require more bandwidth, because bitcode was able to use bit-compression (write individual bits instead of bytes) for the previous EntityActionsMessage.

I will try b. first to have the biggest difference in performance

Nul-led commented 3 weeks ago

imo bandwidth is still the most important thing to optimize for multiplayer games. If its impossible to optimize the serialization due to an increase in bandwidth, having this tradeoff seems worth it. Ofc finding a balance is a good idea but i feel like optimizing for bandwidth first makes more sense.

cBournhonesque commented 3 weeks ago

Agreed that bandwidth is more important but this is still a massive difference: 40X ! I would accept something like 5 or 10X (i.e. 300us to serialize the 1000 entities), but 1.3ms is way too much

I think it's mostly due to 2 things:

Nul-led commented 3 weeks ago

Do you know how well lightyear compares to them in terms of avg bandwidth usage? Might be interesting to have benchmarks for that too..

Apart from that 1) sounds like a massive pain to resolve and would probably require huge internal changes as far as i can tell 2) what is your definition of "easier" serialization?

cBournhonesque commented 3 weeks ago
  1. Is actually much easier to do than what i do now. Right now I do some complicated stuff to make sure that EntityActions are split up instead of sending them all as a giant message. I think the first thing i'll try is actually to just send all EntityActions in one giant message where I serialize stuff directly inside.
  2. Something that uses the Read/Write traits, and doesn't use bit-level stuff
cBournhonesque commented 3 weeks ago

As a simple first step, replicating everything as a single ReplicationGroup brings the time to around 780us, which is a huge improvement. Probably because we don't allocate new space in the hashmaps, and we the vec allocations are more efficient.

Also instead instead of sending lots of small messages we send one big message, which made be more efficient for channel internals.

The full trace with all log spans is 1.9ms, with:

If I remove the prepare_entity_spawn and prepare_component_insert tracing, I get 1ms with:

So should we get rid of ReplicationGroups?