Change Gson to faster serialization library

andreasdc commented 8 months ago

Feature description

I was looking and the profiler and saw that serializing and deserializing of strings such as scoreboard components is taking some processing time, maybe it would be a good idea to change it to a faster library? The servers have scoreboard plugin, I think tablist plugin like https://github.com/CodeCrafter47/BungeeTabListPlus would have better performance too?

Goal of the feature

When looking at the results from here, gson, which is currently being used in BungeeCord, got 8677 ns, while for example fury-fastest got 442 and jackson 1994. This is almost 20x improvement with fury or 4x with jackson. https://github.com/eishay/jvm-serializers/wiki

Unfitting alternatives

-

Checking

[X] This is not a question or plugin creation help request.
[X] This is a feature or improvement request.
[ ] I have not read these checkboxes and therefore I just ticked them all.
[X] I did not use this form to report a bug.

Outfluencer commented 8 months ago

i personally add a component serial and deserialisation cache to my bungee do speed exactly this up I think not using gson is a bad idea, as gson is the most widely used lib

andreasdc commented 8 months ago

i personally add a component serial and deserialisation cache to my bungee do speed exactly this up I think not using gson is a bad idea, as gson is the most widely used lib

Can I see how did you do that? Did you see any improvement? If the behaviour will be exactly the same and only the performance will be better, I think that's a good idea to implement that.

Outfluencer commented 8 months ago

Behaviour is the same but i have no benchmarks

Outfluencer commented 8 months ago

I added a threadlocal limited cache that only caches the most used components

andreasdc commented 8 months ago

I added a threadlocal limited cache that only caches the most used components

With many components that don't repeat it will be slower than normal one probably.

Outfluencer commented 8 months ago

In most cases one component will be sent to multiple players and gets deserialized multiple times so in most cases its much faster

Outfluencer commented 8 months ago

The only case its, i would not even call ist slower, but has no performance boost is if you senden one single component to only one player

andreasdc commented 8 months ago

Oh yes with some players I see that it will improve by a bit, even proxy pinging will be improved, along with chat, bossbar, scoreboard and yes, every tab component like Item, Header/Footer uses gson too (BungeeTabListPlus would be improved already, because it uses ComponentSerializer.deserialize(). If it won't break anything and would help up to 20x, then I think it's a good idea. In some project I saw that ProxyPing packet is cached, I think that would be nice idea too.

Janmm14 commented 8 months ago

Just because a library is widely used doesn't mean that bungee should stick to it. Bungee would only have to carry the library as dead weight around in case plugins use it for a couple years (file size is not a big problem), but it could switch to a faster serialization library at an time, especially when a new library seems to be an order of magnitude faster in benchmarks.

Edit: However it seems like Fury for example is not yet in a usable state. It does not contain json (de-)serialization.

andreasdc commented 8 months ago

Just because a library is widely used doesn't mean that bungee should stick to it. Bungee would only have to carry the library as dead weight around in case plugins use it for a couple years (file size is not a big problem), but it could switch to a faster serialization library at an time, especially when a new library seems to be an order of magnitude faster in benchmarks.

Edit: However it seems like Fury for example is not yet in a usable state. It does not contain json (de-)serialization.

That's not it?

Janmm14 commented 8 months ago

Just because a library is widely used doesn't mean that bungee should stick to it. Bungee would only have to carry the library as dead weight around in case plugins use it for a couple years (file size is not a big problem), but it could switch to a faster serialization library at an time, especially when a new library seems to be an order of magnitude faster in benchmarks. Edit: However it seems like Fury for example is not yet in a usable state. It does not contain json (de-)serialization.

That's not it?

No, that does not create json, it serializes to bytes with a custom format or maybe its using jvm's serialization format. I am not sure whether Fury is actually intended to serialize to json at all, given there is no mention of such json in explixitely in documentation or code. Also the existing documentation is lacking a lot.

andreasdc commented 8 months ago

Hmm, the fastest that has json in it's name is: https://github.com/ngs-doo/dsl-json - which is almost 8x faster than gson.

Outfluencer commented 8 months ago

problem with cache is that a cached component is mutable, in most cases thats not rpoblem but if you use components weirdly it can be

Outfluencer commented 8 months ago

that i i thought we could do it

Outfluencer commented 8 months ago

in this case IdentityLinkedHashMap is limited to 128 and the oldest (the oldest one is the one that is used the least) value in der cache is removed any time a new is added

andreasdc commented 8 months ago

I don't think using cache, when you don't know what will go in the json, is a good idea. Many json strings will be unique, having different things like stats, names, ping etc. You could technically check if it's unique, but it will still use extra resources. IMO using better library will increase json's performance by up to 8x and should bring great result. 🚀 💘

Outfluencer commented 8 months ago

I don't think using cache, when you don't know what will go in the json, is a good idea. Many json strings will be unique, having different things like stats, names, ping etc. You could technically check if it's unique, but it will still use extra resources. IMO using better library will increase json's performance by up to 8x and should bring great result. 🚀 💘

thats why i am checking for identity instead so we dont lose any performance

Janmm14 commented 8 months ago

I don't think using cache, when you don't know what will go in the json, is a good idea. Many json strings will be unique, having different things like stats, names, ping etc. You could technically check if it's unique, but it will still use extra resources. IMO using better library will increase json's performance by up to 8x and should bring great result. 🚀 💘

thats why i am checking for identity instead so we dont lose any performance

This is no suitable general solution as BaseComponents are not final and itself or its extra's could change at any point in time without identity changing.

With an identity based hash map, I do not see any actual cache hits in vanilla bungeecord operation.

Outfluencer commented 8 months ago

I don't think using cache, when you don't know what will go in the json, is a good idea. Many json strings will be unique, having different things like stats, names, ping etc. You could technically check if it's unique, but it will still use extra resources. IMO using better library will increase json's performance by up to 8x and should bring great result. 🚀 💘

thats why i am checking for identity instead so we dont lose any performance

This is no suitable general solution as BaseComponents are not final and itself or its extra's could change at any point in time without identity changing.

With an identity based hash map, I do not see any actual cache hits in vanilla bungeecord operation.

you're 100% right its not any solution i would recommend or make a pr for but maybe in this case it could help him

In my opinion we should not change lib

andreasdc commented 8 months ago

I don't think using cache, when you don't know what will go in the json, is a good idea. Many json strings will be unique, having different things like stats, names, ping etc. You could technically check if it's unique, but it will still use extra resources. IMO using better library will increase json's performance by up to 8x and should bring great result. 🚀 💘

thats why i am checking for identity instead so we dont lose any performance

This is no suitable general solution as BaseComponents are not final and itself or its extra's could change at any point in time without identity changing. With an identity based hash map, I do not see any actual cache hits in vanilla bungeecord operation.

you're 100% right its not any solution i would recommend or make a pr for but maybe in this case it could help him

In my opinion we should not change lib

What's the reason you prefer gson?

Outfluencer commented 8 months ago

most used json lib ig, also i personally used it in all of my projects, i dont think we should recode all json related code just to get 0,01% better cpu times, also components are not sent by the client so its not exploitable

Janmm14 commented 8 months ago

I took a look at some of the java json libraries ranking high here: https://github.com/fabienrenaud/java-json-benchmark I looked at fastjson, dsljson and avajejsonb.

Fastjson is missing documentation in english. The other two are also kinda missing some documentation for the more customizable parts.

I didnt look further into fastjson, but dsljson and avajejsonb are missing the flexibility of redirecting the target deserialization object like we do in ComponentSerializer class. These fast libraries use annotations to generate code in compile time where they use a json parser to directly parse the string into the target object, customited serialization is possible, but to get back the flexibility we'd basically have to create our own parser to a json object and then write code to interprete it into our Component classes. Gson provides this flexibility already.

I did not look at more libraries, the performance improvement is not that high anymore that I'd consider it worth. I also think that with bungee's heavy use of custom serializers we are likely near the top of gson's performance as there is less reflection involved. (I did not look at the gson usage in the benchmark, but if it doesn't use registered serializers but more reflection instead, there's a chance we are even faster than the bench measured.)

andreasdc commented 8 months ago

I think it would help massively, I summarized all gson usage and it was 4,12%, it will help with scoreboard, bossbar, chat, server pinging, BungeeTabListPlus and some more. It was calculated during a big attack, that had no impact on gson. Without the attack it looks like gson operations are taking 8% of cpu time.

Outfluencer commented 8 months ago

And what do you expect from the change? 1% less cpu times?

andreasdc commented 8 months ago

And what do you expect from the change? 1% less cpu times?

Well initially I saw 8x improvement, but idk which lib is possible to use.

bob7l commented 8 months ago

Bungee's serialization is complicated and deep. Changing the json library probably wouldn't net a significant performance improvement. Deeper profiling should be done to see exactly what's taking the most time. If I got time I'll drop some timings from Yourkit here

andreasdc commented 8 months ago

I did my timings using Spark and I described the usage above.

bob7l commented 8 months ago

I did my timings using Spark and I described the usage above.

Yeah, but profiling with something like YourKit will show very detailed timings.

Anyways, here's some profiled results. This is from 130,584 total milliseconds. Just looked into deserialization since it's likely the bigger resource hog.

Part1: https://i.gyazo.com/9d56a65540433521f7e34b7b33943137.png Part2: https://i.gyazo.com/e0fa9bf1dc0141feb981bca81a1e7e2c.png

Flamegraph: https://i.gyazo.com/d5c470b668985eccb58b325dc67e8ce7.png

This is just on a test server with a bunch of fake (1.8.9) players and a single real 1.20.4 player, so the data isn't too good. Would be interesting to see with live 1.20.4+ players.

A large amount of the timings are just from Bungee's deserialization. Which is to be expected, it's pretty intensive. A good amount of timings also go to Gson's internal usage of LinkedTreeMap. I imagine most if not all of these libraries use a linked hashmap or a similar implementation with string keys.

Here's one part of the flamegraph I found interesting: https://i.gyazo.com/3e392e51e317ce3886c55f5dfda8ac0e.png

The toUppercase is using a pretty large amount of time: https://github.com/SpigotMC/BungeeCord/blob/1b88a8471077929bcfbd3a5bd6c7cfdf93df92de/chat/src/main/java/net/md_5/bungee/api/ChatColor.java#L267C49-L267C60

Which would likely be a pretty simple optimization.

Second minor optimization in that same function would be to just flip ( string.startsWith( "#" ) && string.length() == 7 ) so it checks the length first

md-5 commented 8 months ago

Which would likely be a pretty simple optimization.

What's the optimisation?

bob7l commented 8 months ago

Which would likely be a pretty simple optimization.

What's the optimisation?

I meant the toUppercase could be easily optimized, not the whole thing. Whole thing is already well written.

For the uppercase, you could add the different uppercase/lowercase variants to the map rather then changing the string case. Would be a couple thousand entries in total though.

It's only 8.33% of the ChatColor.of function though so mostly a micro optimization...

Outfluencer commented 8 months ago

Thats sounds like a terrible idea

bob7l commented 8 months ago

Thats sounds like a terrible idea

It's not really worth it, but it's not a terrible idea. Very very common practice to achieve O(1). And realistically a tiny memory footprint

Outfluencer commented 8 months ago

No adding only all combinations for strikethrough would be be 8192 string entrys only for one single string now multiply that by the amount of possible other colors/styles and you would notice its not a small memory footprint

andreasdc commented 8 months ago

I did my timings using Spark and I described the usage above.

Yeah, but profiling with something like YourKit will show very detailed timings.

Anyways, here's some profiled results. This is from 130,584 total milliseconds. Just looked into deserialization since it's likely the bigger resource hog.

Part1: https://i.gyazo.com/9d56a65540433521f7e34b7b33943137.png Part2: https://i.gyazo.com/e0fa9bf1dc0141feb981bca81a1e7e2c.png

Flamegraph: https://i.gyazo.com/d5c470b668985eccb58b325dc67e8ce7.png

This is just on a test server with a bunch of fake (1.8.9) players and a single real 1.20.4 player, so the data isn't too good. Would be interesting to see with live 1.20.4+ players.

A large amount of the timings are just from Bungee's deserialization. Which is to be expected, it's pretty intensive. A good amount of timings also go to Gson's internal usage of LinkedTreeMap. I imagine most if not all of these libraries use a linked hashmap or a similar implementation with string keys.

Here's one part of the flamegraph I found interesting: https://i.gyazo.com/3e392e51e317ce3886c55f5dfda8ac0e.png

The toUppercase is using a pretty large amount of time: https://github.com/SpigotMC/BungeeCord/blob/1b88a8471077929bcfbd3a5bd6c7cfdf93df92de/chat/src/main/java/net/md_5/bungee/api/ChatColor.java#L267C49-L267C60

Which would likely be a pretty simple optimization.

Second minor optimization in that same function would be to just flip ( string.startsWith( "#" ) && string.length() == 7 ) so it checks the length first

How did you benchmark it? I think serialization is mostly done on higher versions and you see it heavily used with tablist and scoreboard.

bob7l commented 8 months ago

No adding only all combinations for strikethrough would be be 8192 string entrys only for one single string now multiply that by the amount of possible other colors/styles and you would notice its not a small memory footprint

There would be a total of 20,456 different strings. Might be 1-2MB in total, if even that.

Could also just key by the hashcodes assuming there's no collisions between those 20,456 strings, but that could lead to unintended behavior if there's a collision with a random unknown that was given for color.

Again though, this is based on being asked an optimization and responding. I wouldn't bother with this as optimizing toUpperCase would just be a tiny chip - or in other words micro-optimization. Not to mention hex colors, which have quickly become adopted, are the new norm.

Outfluencer commented 8 months ago

if i look into spark profiler i cant see any extensive gson calls

andreasdc commented 8 months ago

if i look into spark profiler i cant see any extensive gson calls

They need to be actually used

Outfluencer commented 8 months ago

its used in the networking isn't it?

bob7l commented 8 months ago

How did you benchmark it? I think serialization is mostly done on higher versions and you see it heavily used with tablist and scoreboard.

Everything is deserialized into a BaseComponent now for simplicity.

On that note, might be possible to use lazyloading in certain areas (I haven't explored at all, just a thought). If never loaded, forward original payload. etc.

Outfluencer commented 8 months ago

lazyloading would be implementable for all packets that are directly try to parse it as an basecomponent

Janmm14 commented 8 months ago

Its a little contradicting that we have BY_NAME map using upper case while we serialize to lower case. We should examine whether other chat libraries are using uppercase or lowercase after serializartion and then switch our serialization or BY_NAME map. My guess for this is that it might be faster to call lowercase on something that is already lowercase. In theory we could also attempt using length and charAt checks with a final equals check against single values instead of a map for optimization. We do not expect any more named colors and some new boolean style option is also not expected. (I checked the maps chatcolor uses and there are no hash collisions btw)

What I noticed more in the benchmark is the huge JsonElement.has usage inside CompoenentStyleSerializer, actually taking most of the deserialization time. This can be a little optimized by using get and a null check to optimize "hits". For all chat jsons the amount of modifiers in total should not be that high, so I think iterating over the EntrySet of the json object and using a switch inside should be faster, even tho gson is using a linkedtreemap.

I have pushed these little optimizations of the 2nd paragraph here: https://github.com/Janmm14/BungeeCord/tree/chat-optimization @bob7l can you compare the performance maybe?

Lazily deserializing chat might be a valueable optimization as well.

andreasdc commented 8 months ago

I don't see JsonElement.has in my profilers.

Janmm14 commented 8 months ago

I don't see JsonElement.has in my profilers.

You are here in the serialization (objects -> json objects -> json string) part while bob7l and me found things about deserialization (json string -> json objects -> objects).

bob7l commented 8 months ago

I don't see JsonElement.has in my profilers.

Serialization/Deserialization differences aside, you're using Spark's sampler. While it's good for getting a ballpark idea of what's taking up time, it's not very accurate and usually wont show quick functions like map.get and so on.

Whereas I'm using instrumentation/tracing which offers a more precise measurement and is able to pickup on finer details. It's also a lot more useful in these cases with massive trees from having to iterate over many children components.

You should be able to grab a free trial of YourKit. Incredible profiler

andreasdc commented 8 months ago

ComponentSerializer.toString uses gson.toJson, there's nothing about JsonElement.has.

Janmm14 commented 8 months ago

ComponentSerializer.toString uses gson.toJson, there's nothing about JsonElement.has.

Yes. That is the "serializing" part: java objects to json string.

gson is also involved in the "deserializing" part: json string to java objects. This is done with ComponentSerializer.deserialize method (previously .parse).

Different method, still being used a lot in bungee.

If we go to handling "chat" data from packets only on demand, that could save us both steps in a lot of cases, leaving only bungee plugins sending chat to be affected on the bungee side (as the chat part is also used in spigot, serialization speed is still a concern somewhat tho)

Janmm14 commented 8 months ago

In branch chat-optimization-v2 I have created an implementation of lazy deserialization in packets, with methods to hopefully retain full bytecode compatibility (Edit: for the individual packets, not for DefinedPacket methods).

Edit 2: Its untested btw. Maybe someone has a better way of implementing it? It should be possible to replace the long generics of Deserializable with static types, for shorter code, but I'm unsure if thats desired over flexibility towards possible similar attempts for other complex deserialization.

andreasdc commented 8 months ago

In branch chat-optimization-v2 I have created an implementation of lazy deserialization in packets, with methods to hopefully retain full bytecode compatibility (Edit: for the individual packets, not for DefinedPacket methods).

Edit 2: Its untested btw. Maybe someone has a better way of implementing it? It should be possible to replace the long generics of Deserializable with static types, for shorter code, but I'm unsure if thats desired over flexibility towards possible similar attempts for other complex deserialization.

One of the patches https://github.com/SpigotMC/BungeeCord/pull/3629/commits/dd338c5fd546f31c7b459871cf73c425b43adc55 this or that https://github.com/Janmm14/BungeeCord/commit/5b12adb97f0e9db28c123a4d3ef88fd0909ed730 I think broke bold options from the tablist's entries and header/footer.

andreasdc commented 8 months ago

And it's the same for strikethrough too.

Janmm14 commented 8 months ago

weird, i will test later and investigate and find the probably dumb mistake i made

SpigotMC / BungeeCord