NagyD / SDLPoP

An open-source port of Prince of Persia, based on the disassembly of the DOS version.
GNU General Public License v3.0
1.1k stars 141 forks source link

Timer / screen update improvements #146

Closed Falcury closed 6 years ago

Falcury commented 6 years ago

This changes the way screen updates are displayed, and how timers work.

Instead of relying on SDL_Timer delays, frame timings are now handled on the main thread. Basically, frame updates occur automatically, everywhere where idle() is called.

If VSync is enabled, we can use that as a predictable 'pacemaker'. It removes the need for manually adding delays: we can just update the screen and trust that this will provide a suitable delay.~ For keeping track of elapsed time, I used 'SDL_GetPerformanceCounter()`, which has good accuracy for sub-millisecond times.

The frame rates and timer accuracy should be more stable than before. (To examine the timings, uncomment the printf() statement in has_timer_stopped().

The procedure request_screen_update() and the variable screen_updates_suspended (which were used to control when screen updates were triggered) are entirely removed. Also, it is now possible to display user interface elements as an overlay that updates at a higher frequency than the normal game loop (for example, an in-game menu overlay).

Timers using the USE_COMPAT_TIMER procedures still work.

Misc other changes:

NagyD commented 6 years ago

I tried this patch. It seems to work correctly; however, I have some observations and questions:

Falcury commented 6 years ago

Thank you for your observations! I can see that this is not ready to be merged, so I'm adding a work-in-progress tag.

CPU usage rised from 2% to 40%. (With the default window size.) Why? If I take out the SDL_BlitSurface() calls from update_screen() and directly use onscreensurface in SDL_UpdateTexture(), then it falls back to 10%. I guess the problem is that the screen is updated at 60 fps (for the overlays).

I expected the CPU usage to rise, but from 2% to 40% is quite extreme... On my system, too, it looks like the SDL_BlitSurface() calls are the expensive operations, although for me it rises only to ~7% (on a 120Hz screen). However, if I duplicate those SDL_BlitSurface() calls a few times, I can easily max out the CPU. It looks like these are operations that are so expensive that they should really be done on the GPU. Maybe SDL_RenderCopy() is more appropriate. I should maybe do some experimenting with that. (By the way, the window size should not matter for the CPU usage, because the upscaling from 320x200 usually happens on the GPU)

Or, we could decide that we don't want overlays for now, and leave that part out.

Still peculiar that CPU would rise from 2% to 10%, then. Maybe because input is being processed more frequently. Or perhaps SDL_UpdateTexture() is expensive as well.

The animated left-to-right transitions in the intro became quite slow.

Yes, because every position of the transition is guaranteed to be displayed. At 120 Hz, the speed is acceptable, but it probably shouldn't be any slower... On the other hand, there should now also be less stuttering in the fade in / fade out transitions. And because fading works with timers, the speed of fading remains the same on 120 Hz and 60 Hz.

What happens if the refresh rate is not a multiple of 60 Hz? For example my screen supports 60, 70, 72 and 75 Hz refresh rates. (SNES9x also shows 56 Hz and 59 Hz in the video mode list, but only for certain resolutions.) The game seems to work correctly with 75 Hz, but the reported framerate is closer to 12.5 fps than to 12 fps.

Good point, I didn't take that into account. I wonder how this can be remedied. The game itself works with ticks of 1/60 seconds duration. I wonder how the various ports of Prince of Persia solved this problem? (Although from what I remember, even the original Apple II and DOS versions did not run at exactly the right speed...)

Why are fast refresh rates a problem? https://github.com/NagyD/SDLPoP/pull/146/files#diff-fbdb148a428972ef0f74676e1372e5ebR3322 Namely, why do we have to process everything within the time of a single frame (as defined by the screen's refresh rate)?

Part of the idea was that we could delay processing the user input and drawing the next frame until the last possible moment, because this could decrease input latency by some number of milliseconds. (Processing input and drawing the frame should only take several milliseconds anyway, and it would be a 'waste' if most of the waiting time was spent waiting for the next screen update).

The problem is that I set up VSync as the 'pacemaker'. So, we can never 'miss' a screen update, or we'll simply drop a frame (meaning, on a 60 Hz monitor, 16.7 ms will vanish into thin air). Maybe putting so much trust in VSync wasn't the best idea. I don't know. Maybe we should just make something similar to SDL_Timer. Namely, an implementation that actually works with sub-millisecond intervals and acts like a predictable pacemaker.

Note that I can't find any way to even disable VSync. There is SDL_GL_SetSwapInterval(0), but that does not seem to actually disable VSync (and we're not even using OpenGL at all...)

Is the new code visibly better than the old? I can't see any difference...

I was able to delete more code than I added. See for example play_level_2(). Even after considering that I pulled out the debug timer overlay code from there, the code got cleaned up quite a bit. E.g., screen_updates_suspended is no longer necessary, I deleted it everywhere I saw it.

NagyD commented 6 years ago

I expected the CPU usage to rise, but from 2% to 40% is quite extreme...

I guess my system is ancient. :) For example, running PoP in the browser (Firefox) with a HTML5 DOSBox eats 90% CPU, and the animation is quite uneven. Most HTML5 canvas games do the same thing as well, if they update at 60 Hz.

Running PoP in the regular, offline DOSBox never eats more than 3% CPU with cycles=3000, and everything is smooth. I wonder why CPU usage is not even 10%, even though probably it updates the screen at 60 Hz as well. Maybe because it uses SDL 1.x?

Note that I can't find any way to even disable VSync.

I suppose you mean after the renderer was created. Otherwise you could just remove the VSYNC flag here. :)

Is the new code visibly better than the old? I can't see any difference...

I was able to delete more code than I added.

Sorry for the confusion; I meant that when playing the game, is there any noticeable difference?

Part of the idea was that we could delay processing the user input and drawing the next frame until the last possible moment,

I guess for that you need to know how long will the processing will take, in advance?

because this could decrease input latency by some number of milliseconds.

It that even noticeable?

Falcury commented 6 years ago

CPU usage rised from 2% to 40%.

This is hopefully (partially) fixed in the new commit.

I suppose you mean after the renderer was created. Otherwise you could just remove the VSYNC flag here. :)

Yes, I should probably have undone that... I also found this: https://wiki.libsdl.org/SDL_HINT_RENDER_VSYNC So, it looks like setting the SDL_HINT_RENDER_VSYNC hint is the essential step. Also explains why I thought earlier that the SDL_RENDERER_PRESENTVSYNC flag seemed to have no effect.

So in the new commit, I set that hint and then let the timing in has_timer_stopped() be based on this:

Uint64 current_counter = SDL_GetPerformanceCounter();
int ticks_elapsed = (int)((current_counter / perf_counters_per_tick) - (timer_last_counter[timer_index] / perf_counters_per_tick));
if (ticks_elapsed >= wait_time[timer_index]) {
    timer_last_counter[timer_index] = current_counter;
    return true;
} else {
    return false;
}

I created a test for checking the accuracy of the in-game timer on this branch: https://github.com/Falcury/SDLPoP/tree/check_timing It prints out this when I start the game:

Seconds left = 3599.916748
rem_min: 60   game elapsed (s): 1.00    actual elapsed (s): 1.00     delta: 0.00
rem_min: 60   game elapsed (s): 2.00    actual elapsed (s): 2.00     delta: -0.00
rem_min: 60   game elapsed (s): 3.00    actual elapsed (s): 3.00     delta: 0.00
rem_min: 60   game elapsed (s): 4.00    actual elapsed (s): 4.00     delta: -0.00
rem_min: 60   game elapsed (s): 5.00    actual elapsed (s): 5.00     delta: 0.00
rem_min: 60   game elapsed (s): 6.00    actual elapsed (s): 6.00     delta: 0.00
rem_min: 60   game elapsed (s): 7.00    actual elapsed (s): 7.00     delta: 0.00
rem_min: 60   game elapsed (s): 8.00    actual elapsed (s): 8.00     delta: -0.00
rem_min: 60   game elapsed (s): 9.00    actual elapsed (s): 9.00     delta: 0.00
rem_min: 60   game elapsed (s): 10.00    actual elapsed (s): 10.00     delta: -0.00

(The timings quickly become incorrect as soon as I pause the game, move the window or enter combat)

For comparison, the same check on the master branch:

Seconds left = 3599.916748
rem_min: 60   game elapsed (s): 1.00    actual elapsed (s): 1.02     delta: 0.02
rem_min: 60   game elapsed (s): 2.00    actual elapsed (s): 2.02     delta: 0.02
rem_min: 60   game elapsed (s): 3.00    actual elapsed (s): 3.04     delta: 0.04
rem_min: 60   game elapsed (s): 4.00    actual elapsed (s): 4.08     delta: 0.08
rem_min: 60   game elapsed (s): 5.00    actual elapsed (s): 5.09     delta: 0.09
rem_min: 60   game elapsed (s): 6.00    actual elapsed (s): 6.10     delta: 0.10
rem_min: 60   game elapsed (s): 7.00    actual elapsed (s): 7.11     delta: 0.11
rem_min: 60   game elapsed (s): 8.00    actual elapsed (s): 8.12     delta: 0.12
rem_min: 60   game elapsed (s): 9.00    actual elapsed (s): 9.14     delta: 0.14
rem_min: 60   game elapsed (s): 10.00    actual elapsed (s): 10.16     delta: 0.16
NagyD commented 6 years ago

CPU usage rised from 2% to 40%.

This is hopefully (partially) fixed in the new commit.

Now it's only 14%.

Falcury commented 6 years ago

Now it's only 14%.

I think I finally managed to fix this in the last commit. Instead of adding a small delay to update_screen(), I added delays to do_wait(), do_simple_wait() and various cutscenes where delays were not manually invoked.

The problem was that has_timer_stopped() then started falling behind; fortunately, this could be fixed by subtracting the delay's overshoot from the new value of timer_last_counter[timer_index]. This basically 'backdates' the timer to when it was originally supposed to end, allowing the timer top catch up so the next frame can be drawn on time.

Uint64 current_counter = SDL_GetPerformanceCounter();
int ticks_elapsed = (int)((current_counter / perf_counters_per_tick) - (timer_last_counter[timer_index] / perf_counters_per_tick));
int overshoot = ticks_elapsed - wait_time[timer_index];
if (overshoot >= 0) {
    if (overshoot > 0 && overshoot <= 3) {
        current_counter -= overshoot * perf_counters_per_tick;
    }
    timer_last_counter[timer_index] = current_counter;
    return true;
} else {
    return false;
}

I expect that CPU usage should be back to 2% for you. The wipe transition is now too fast, unfortunately. Maybe we should set a fixed speed for that?

NagyD commented 6 years ago

CPU usage is indeed back below 2%. It even reached 0%.

However, it seems that in the first part of the intro (after the fade-in), events (like keypresses) are processed only when the screen is updated. I could fix this in the following way: Copy back the old do_wait() that was removed in 1ac3d53782c1616d114489c434e9bee0204626ac, and remove the do_timer_delay() call from it.

Falcury commented 6 years ago

Ah, thank you for catching that. Unfortunately, CPU usage rises to 100% on the title screen, if I remove the do_timer_delay() call (because then the CPU never yields). However, if I add SDL_Delay(1), the CPU usage drops back to zero for me. I added a commit with the proposed fix.

Falcury commented 6 years ago
  • The animated left-to-right transitions in the intro became quite slow.
    • That's probably because the game now waits for VSync after every step in the transition.

The wipe transition is now too fast, unfortunately. Maybe we should set a fixed speed for that?

I slowed down the left-to-right animation in this commit: 695a3e6 For now, I set it to 180 fps because that seemed to be a comfortable speed. But is there such a thing as the 'correct' speed for this animation?

NagyD commented 6 years ago

I added a commit with the proposed fix.

This seems to work well for me. Keypresses now work in the intro.

But is there such a thing as the 'correct' speed for this animation?

The original game uses zero wait time, which means the transition is as fast as the speed of the CPU (or DOSBox) allows.

Falcury commented 6 years ago

The original game uses zero wait time, which means the transition is as fast as the speed of the CPU (or DOSBox) allows.

What I'm wondering is, maybe Jordan Mechner was satisfied with the animation speed on an Apple II, and the code was left similar on the DOS port, even though CPU speed was not necessarily the same? Then, maybe the Apple II speed should be regarded as the 'correct' speed?

I found this YouTube video of the intro movie on the Apple IIe: https://www.youtube.com/watch?v=7m7j2VuWhQ0 Hm, looking at the speed of the transition and comparing with SDLPoP, the speed is probably closer to 120 fps than 180 fps.

Falcury commented 6 years ago

Because of the number of commits in this branch and because of merging issues, I squashed the changes into a single commit and made a new pull request: #153.