cheahjs / palworld-save-tools

Tools for converting Palworld .sav files to JSON and back
MIT License
790 stars 71 forks source link

Investigation: Use more performant JSON libraries #155

Open cheahjs opened 7 months ago

cheahjs commented 7 months ago

The python json stdlib is not known for it's speed, and JSON dumping accounts for a significant fraction of the time taken to convert a save file to JSON.

Investigate if it is feasible to use some of the higher performance libraries such as orjson or ujson, and how much of an improvement it is.

It is not a general solution - most alternative JSON libraries require strict UTF-8 compliance, which Unreal's treatment of UTF-16 as arbitrary 16-bit chars is incompatible with - currently surrogatepass is used to encode non-valid characters into surrogate pairs, but this is not possible in a UTF-8 only environment.

AntiMoron commented 7 months ago

There are still some things that we can leverage to optimize the JSON output part. According to my observation, the following things can be done to compress it further:

  1. even with '--minify-json' on, there are still spaces. Remove them. image

  2. the json follows a pattern of the following:

    • value wrapped by: 'values', 'value', 'RawData','object'. remove those meaningless wraps.
    • many uuid like '00000000-0000-0000-0000-000000000000', change those to null, even further, just not output this field.
    • dont output any fields whose value is null. image
    • field name compression : just provide us a map, and replace origin keys, e.g.: type -> t, value -> v, values -> vs
    • make it stream, it's not rendered as a stream, which surely takes a lot of RAM usage. Since we are generating a JSON, we can do that by concating strings, or use some library(I'm not good at python, I don't know any).

After these, it should be very fast, and the output file should be smaller, around hundreds of MB.

AntiMoron commented 7 months ago

Also, really thanks the good work of analysing and parsing, currently resource usage and parsing speed matters, on dedicated servers, I have to set the --cpus=0.5 in docker to make sure the server won't be down while proceeding the save file. So please consider to do it in some faster language like C++ or Rust.

cheahjs commented 7 months ago

I will point you to a previous comment I made about building this in a different language: https://github.com/cheahjs/palworld-save-tools/issues/83#issuecomment-1915837671

I will not make any changes to the JSON output at this time, as a long enough time has passed that there will be significant downstream impact on existing users.

If you need faster performance:

  1. Don't output to JSON and operate on the Python dictionary directly
  2. Use https://github.com/magicbear/palworld-server-toolkit which has implemented various optimisations on top of this library