godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine
https://godotengine.org
MIT License
91.3k stars 21.24k forks source link

Huge performance regression using Tr() when moving project from 3.5.3 to 3.6 (.net) #98228

Open deventide opened 1 month ago

deventide commented 1 month ago

Tested versions

System information

Windows 11 - Godot 3.5.3 .net - GLES 3 - RTX4080 - 14900k

Issue description

For my existing project I noticed a huge performance regression when switching from 3.5.3 .net to 3.6 .net

When updating multiple tooltips (~20-30) at once that have multiple text fragments (~20) that are all translated using the Tr() translation server I notice huge frame drops from 60 fps down to 15 fps. We have 6 translation files that together have ~6000 lines in 13 languages. The exact same project ran at stable 60 frames in 3.5.3.

Steps to reproduce

Run the MRP in 3.5.3 .net and 3.6 .net. Check cpu load and fps. Update multiple labels with multiple longer translation tags utilising 6 translation files with ~6000 lines in total with 13 languages.

Minimal reproduction project (MRP)

Run this project in 3.5.3 .net and in 3.6 .net and check cpu load and fps. Godot3.6PerformanceIssue.zip

timothyqiu commented 1 month ago

Changed the MRP to GDScript version for easier testing: performance.zip

On my machine, it's shows about 370 fps on 3.5.3.stable and about 26 fps on 3.6.stable.

deventide commented 1 month ago

Changed the MRP to GDScript version for easier testing: performance.zip

On my machine, it's shows about 370 fps on 3.5.3.stable and about 26 fps on 3.6.stable.

Thanks, good to know this is reproducible on different hardware as well.

It seems to be related on how the lookup for longer translation tags and strings are handled. The performance regression became very noticeable in the MRP when adding some of the longer translation tags of our project to the translation files.

timothyqiu commented 1 month ago

I've also found before that a lot of repetitive string operations in TranslationServer are cacheable. I'm trying to build a modified version, and it's about six times faster for dev builds, which should help with 4.x as well.

lawnjelly commented 1 month ago

Should be fixed in 3.x with #98234 , will change milestone to 4.x.

deventide commented 1 month ago

Will https://github.com/godotengine/godot/pull/98234 fix the entire performance regression introduced by compare_locales()? If I understood it correctly 3.7 will still half the fps for the MRP compared to 3.5.3.

lawnjelly commented 1 month ago

Will https://github.com/godotengine/godot/pull/98234 fix the entire performance regression introduced by compare_locales()? If I understood it correctly 3.7 will still half the fps for the MRP compared to 3.5.3.

Should make it a lot faster according to @timothyqiu 's figures.

However from the look of the MRP, ideally you shouldn't actually be calling tr() continuously either in user code.

It would be better to call it once, when text changes, and cache the result in user code. The tr() system can't do it as efficiently as if you cache yourself per label (as it will essentially always have to lookup from hash table or similar).

timothyqiu commented 1 month ago

Will #98234 fix the entire performance regression introduced by compare_locales()?

Mostly positive if we limit the scope to compare_locales().

TranslationServer in 3.6 introduced more features/complexity, so it is destined to be slower than it was in 3.5.x.

This is a matter of degree of optimization. Practically speaking, eliminating the impact of compare_locales() brings the speed up to an acceptable level. For example, the original MRP's target FPS was 60, which is now achieved.

If there are other usages that still result in an unacceptably low FPS, we can try to optimize it again.

deventide commented 1 month ago

However from the look of the MRP, ideally you shouldn't actually be calling tr() continuously either in user code.

True, the continuous call was added to the MRP for easier performance testing and is not a real use case scenario. For my project it is called once and updates all labels, which has lead to the described fps drops. Good hint that caching results is the way to go for labels that remain unchanged after the update.

TranslationServer in 3.6 introduced more features/complexity, so it is destined to be slower than it was in 3.5.x.

Thanks, I think our project will then be fine if we as well implement some caching to compensate for the slower TranslationServer in 3.6 upwards.