citizenfx / fivem

The source code for the Cfx.re modification frameworks, such as FiveM, RedM and LibertyM, as well as FXServer.
https://cfx.re/
3.57k stars 2.11k forks source link

Performance issue on getting a state bag value (server side) #1942

Open joelwurtz opened 1 year ago

joelwurtz commented 1 year ago

We are adding metrics on our server to detect and tackle performance on some of our scripts, and we where having a strange performance lag on one part of our code where the code is only doing something like :

setTick(() => {
     const data = GlobalState.data;

     // do some stuff

     GlobalState.data = newData;

     await wait(1000);
});

Where there is a "low" player count (between 1 to 80 player), this code run normally under 5ms, however when there is more player it can go up to more than 100ms (even more than 250ms in one or two tick),

See the attached graph, (first one is the tick duration, second one is our player count)

image

At first we were believing there was some problems when setting the value to the callback since it will lock the data and need to update other players.

So we add a metrics around this call to mesure it

image

As you can see, the set in this call was always under 5 ms so this was not it, after looking more at the code and others events, we where having another tick that have the same problem, however this tick was only getting a value from an Entity state bag (and do nothing else except calculating some data and keeping it in memory)

image

As you can see both tick have (at some details) the exact same time in duration, after adding some more metrics around the GET_STATE_BAG_VALUE native we see that it was this native that was inducing the lag :

image

When looking at the code in fivem, there is a Mutex when getting a state bag see https://github.com/citizenfx/fivem/blob/master/code/components/citizen-resources-core/src/StateBagComponent.cpp#L473

Since this problemes appear on the first time that we actually need a state bag and not after, we strongly believe that there is a lot of write pressure on this lock so it's is slow to release (maybe from another thread ?) when having a large player count.

At one point we were thinking that, since we have more people we may have more entity created (which create a state bag every time if i'm not wrong) but this was not the case as you can see here :

image

There seems to be no correlation between both events

Looking more at the code we also see there is a lot of time this mutex is write locked. Even when not updating the map, like here https://github.com/citizenfx/fivem/blob/master/code/components/citizen-resources-core/src/StateBagComponent.cpp#L484 (since it update only the targets i believe we can safely use another mutex here to relieve some pressure)

But i look at every place this mutex is lock and i don't see why there is this much pressure.

Our server is on 5848 build but after looking at the diff (i know there was some update on state bag some months ago), i don't think this will be resolved if we update.

Also all our graph are done on server side, i do believe there is the same problem in client but since it handles less state bag it should not appear so often.

Do you have more insights on this ?

AvarianKnight commented 1 year ago

It might be a lot more useful to provide ETW traces when these issues are happening. This can provide Cfx with a lot more detail of whats going on between each tick

joelwurtz commented 1 year ago

I have a done a trace but it does not show much information about that, i'm willing to share it but does it gave away the memory also ? (If yes i idon't want to share it publicly then)

I'm wondering also, if i'm not wrong, player are subscribed / unsubcribed from those bags when they entter / leave the culling zone of an entity ? Maybe having a lot of players would then stress a lot the Add / Remove target to the state bag, if that's the case then my pr should definitely help

blattersturm commented 1 year ago

but does it gave away the memory also

no

joelwurtz commented 1 year ago

Here is the profile https://www.swisstransfer.com/d/b38b42c5-15f7-4fb7-9bc7-caba244d19e2

During this profile there was a lag between 10 to 50ms when fetching the state bag

robert4049 commented 1 year ago

Sorry to butt in, but what tool is this? It seems to be Grafana but how are you getting performance data to it?

PichotM commented 1 year ago

Sorry to butt in, but what tool is this? It seems to be Grafana but how are you getting performance data to it?

Probably https://your-server-endpoint:30120/perf

joelwurtz commented 1 year ago

Probably https://your-server-endpoint:30120/perf

We indeed use some of those metrics, but will also wrap every event handler around our own function that measure it's performance, All our code is open source, you can see that here : https://github.com/SOZ-Faut-etre-Sub/SOZ-FiveM-Server (the metrics part is in our "core" module at https://github.com/SOZ-Faut-etre-Sub/SOZ-FiveM-Server/tree/oss/resources/%5Bsoz%5D/soz-core)