Scirra / Construct-bugs

Public bug report submissions for Construct 3 and Construct Animate. Please read the guidelines then click the 'Issues' tab to get started.
https://www.construct.net
107 stars 83 forks source link

Possible post r389 and r409 performance regression #8269

Open MikalDev opened 1 week ago

MikalDev commented 1 week ago

Problem description

Post r389, there seems to be performance regression as seen in #8259 .

A change was done in r409 to improve the performance regression, but there still seems to be a significant delta in performance between r389 and r409.

Attach a .c3p

PerfTest5.zip

This is the same test used in #8259

Steps to reproduce

Run test on r389 and r409, note CPU and FPS.

Observed result

Testing on r389 and r409, note CPU and FPS, in our testing in the community there looks to more than a 10% difference.

Expected result

Similar performance.

More details

Doing some chrome dev perf traces, it seems like GetX and GetY have the biggest perf difference between r389 and r409. As an experiment, I exported r409 to html and edited the commonACEs.js file to revert only GetX() and GetY() to their prior definition in r389. This was only an experiment to see if a possible cause could be found.

Testing again, the modified r409 version did get a nice performance boost. Here are the three together screen grab from community:

r389

I have 3 versions available on the web, first is vanilla r409, second is r409 with GetX() and GetY() modified, third is the rest of the similar functions to GetX and GetY modified. The last one has the best performance.

https://kindeyegames.com/perfOrig/ https://kindeyegames.com/perfMod/ (GetX, GetY changed) https://kindeyegames.com/perfModMore/ (remaining similar functions changed)

Example of the changes done:

return GetWorldInfo(this).GetX()

(r409) back to

return this.GetWorldInfo().GetX()

(r389)

Affected browsers/platforms:

First affected release: r390 r409 (partial fix)

So, this is mainly an observation, it would obviously be nice to not have performance regressions, but C3 is being updated for SDK V2 and other features, so I understand it is a question of priorities. if the changes above appear to be the cause, perhaps there is another way to get the same 'feature' result without as much perf impact?

System details

View details PASTE HERE
JeFawk commented 1 week ago

Unsure if this is related, but I'm working on 2 projects, one on the LTS version, another one on the latest stable.

The latest stable's editor has weird performance problems: sometimes copy-pasting lags, like a lot: it creates the block smaller, and waits for a few seconds, then pastes the content in it. Other times when pasting, it pastes double.

Noticed random lag in other areas but from the top of my head I cannot remember.

AshleyScirra commented 1 week ago

In general, it is not feasible to continue developing software with no performance regressions ever. In the past we've had cases where adding a single JavaScript object property increases the memory bandwidth in some benchmark causing a measurable reduction of a few percent in the benchmark score. Or perhaps somewhere we have to insert an if that introduces branching in to some intensive benchmark and so slightly reduces the speed - which is what I think happened here. But it's impractical to continue developing Construct if we decide it's not acceptable to add new object properties or if statements if they ever produce any kind of measurable performance regression on any kind of benchmark. When you dig in to the details of how CPUs work, in the real world often such changes actually have literally zero impact on real-world performance - in many cases, outside of a benchmark, such a change means the CPU goes from being idle waiting for a memory read, to performing branching while waiting for the same memory read, taking exactly the same time overall. So the real question is: does it affect real-world projects? And it's better not to obsess about artificial benchmarks.

I will investigate this anyway in case there is anything feasible that can be done, as I would agree ideally Construct updates should not regress performance where it is possible to avoid it. But really it's a sign of how insanely fast JavaScript and the Construct engine already are that tiny changes can produce measurable differences, and I do think this kind of difference is extremely unlikely to actually affect any real-world projects.

dop2000 commented 1 week ago

So the real question is: does it affect real-world projects?

This is our game running on my desktop: image

2-3 fps deficit is pretty consistent. Obviously at 90 fps it's not a problem, but many people play on very low-end devices like chromebooks or Steam Deck with FPS ~30-40. And for such players every frame counts. I was hoping that the fix for #8320 would improve the performance in the latest releases, but it's actually slightly worse than in pre-390 versions.

F3der1co commented 1 week ago

I guess Moonstone Island has a lot of systems running in the background, Also idk how powerful the desktop is in this case. Also in that example the difference might be larger in reality as there were optimizations for hierarchy and tween after 389.

Still regressions like these should definitely be avoided, especially if there is no real major tangible upside that requires that change in the codebase.

dop2000 commented 1 week ago

@Jase0000 Normal values on my PC are 144 fps 20-30% cpu. But some players are going crazy with farming and decorating, building huge farms and houses, placing thousands of objects. The screenshot above is one of such saves, basically a stress test scenario. And yes, I know that Steam Deck can handle AAA games, but for some reason our C3 game runs not so great on it. For the most part the performance is good, 50-60 fps. But there are micro-lags and the framerate goes down when there are too many objects/particles on the screen. I blame NWjs.

F3der1co commented 1 week ago

I mean also AAA games tend to use precompiled lower level languages for the performance critical systems, with engineers focusing on optimizing to extremes like SIMD and using much more multi threading. Compared to js with a fairly rigid event system built on top, mostly running on a single thread.

dop2000 commented 1 week ago

@Jase0000 I don't want to discuss the inner workings of our project. It has 11K events, we've been developing it for over 3 years and went through many rounds of optimization. Like I said, that screenshot is a stress test scenario, and actually both previews were running simultaneously, because I wanted to push CPU to 100%. (per Ashley's recommendation) Not all players will have so many objects in the game, but also not all players have powerful gaming computers, many are playing on crappy laptops.

dop2000 commented 1 week ago

@Jase0000

just a stern reply.

Sorry, it wasn't meant to be stern! I appreciate the advice, I just meant that this is probably not a good place to discuss our project, it steers the discussion away from the actual topic.

of "does this affect real world projects?" which you later revealed is a unique uncommon situation that only some players will do

According to Ashley, "it is only worthwhile to benchmark performance when running at 100% CPU". I can't do this on my PC without pushing the game to the limit. It's an artificial scenario, but still possible for some players.

how does anyone know if you had different logic depending on time of day

Both previews are running the same project, same save file, same everything. And I repeated the test several times to be sure.

I wanted to demonstrate that there is in fact a drop in performance in a real world project. It is small, but it's still quantifiable. Maybe our game is more susceptible to this issue. I am actually curious to see similar tests for other people's projects.

DinoSystem commented 1 week ago

I experience this from 389 onward. Didn't manage to reliably show this in a new project, but in my game, which is huge, that difference was palpable

dop2000 commented 1 week ago

when fully moved to sdkv2, would that mean every aspect of the runtime gets a small performance loss, totalling a huge loss overall in cpu performance far worse than a 10% reduction?

Exactly! I think everyone here is worried about this.

@DinoSystem Could you test 389 -> 409?

DinoSystem commented 1 week ago

Yes, in my game, same exact situation, objects etc, 388.2/389 is consistantly (not considerably) smoother of any newer version, included 409

i reported this to Mikal and Federico months ago, but didn't manage to reproduce it to file a report, im glad i was not hallucinating, apparently

F3der1co commented 1 week ago

Yea I remember, it had like 1-3 fps difference when I tested your example project, but I chalked it up to margin of error or some other cause at that time.

dop2000 commented 1 week ago

Here is a new test, as real world as it can get. Started a new game, everything is by default. Recorded fps and cpu values every 0.1s for two minutes. Hardware - laptop with i5-7200U, 8GB RAM.

Character standing idle: r368: average FPS 59.63, average CPU 39.56% r409: average FPS 59.41, average CPU 47.57%

Character running around the home island in circles: r368: average FPS 56.58, average CPU 78.54% r409: average FPS 55.10, average CPU 78.21%

Performance drop is small, but it does exist. It would probably be more significant if it wasn't balanced out by that "pick children" change. (we are using hierarchies a lot)

Jase0000 commented 1 week ago

Couldn't downgrade to r389 for my main project (I also can't revert to LTS with this project, fails to open) but opened other project with far less going on, and can indeed observe 4% cpu difference:

R389 19% cpu

R409 23% cpu

It's only a slight change and thankfully not 10% difference, although if I had it my way then no, no performance lossess especially CPU as I'm always optimising for cpu usage more than anything in the last few years, but, I'm worried as sdkv2 is implemented more, this is going to increase.

EDIT I misunderstood numbers, a difference like this is more around 15% or so increase relative to 19% going up to 23%, thank you Federico!

Chadori commented 6 days ago

I can also confirm this problem. To add, mobile has it worse.

AshleyScirra commented 1 day ago

I've done more optimization work for the next release to try to further mitigate this. I believe it should run as fast as older releases in most cases again. It is quite a complex change so it is not something we would consider merging to LTS in this case. Marking to check next release to see how it benchmarks in real-world projects.

AshleyScirra commented 22 hours ago

r410 is now out with the change, please try that out - hopefully it restores performance to the original level.

Jase0000 commented 16 hours ago

Yep this got my cpu back down to 19%, same as before these changes.

Thank you so much! This cured my concerns of what the future held.

MikalDev commented 13 hours ago

I think it does a nice lift from r409, doesn't quite seem to hit r389, but a nice improvement, I appreciate the work done.

Here's some results from the temple demo, with uncapped frame rate. Yes, this is not a game, this is just for benchmarking to see the difference. I will again say that I understand Scirra has to tradeoff the priorities between perf and requirements of SDKV2, so at some point the performance difference will be considered acceptable vs the requirements of SDK V2. It is also good to benchmark, so the tradeoffs can be better understood.

Unless someone wants to push this further, I am going to close this issue.

Uncapped FPS temple: R389: 109 FPS https://kindeyegames.com/perf389-u R410: 98 FPS https://kindeyegames.com/perf410-u R409: 87 FPS https://kindeyegames.com/perf409-u

Screenshot_2024-10-09_143859