amethyst / bracket-lib

The Roguelike Toolkit (RLTK), implemented for Rust.
MIT License
1.53k stars 111 forks source link

High CPU usage on Mac #228

Closed singalen closed 3 years ago

singalen commented 3 years ago

Looks like on Mac my game is running the main loop as fast as it can. What can I do about it? How can I help to diagnose it? Ideally, it should only redraw in response to keyboard and, optionally, mouse events.

singalen commented 3 years ago

I've used this code from BTerm examples:

let mut input = INPUT.lock();
if let Some(ev) = input.pop() ...

but thing is, I don't want to consume the event. I want to let BTerm keep track of the keys, like shift and other state like mouse_pos. Currently, there is no way to check if an event is available. pop() and for_each_message() consume events.

Can we please have peek() -> Option<BEvent> that would return the clone of the current event? Or for_each_message_ref(&BEvent)?

Thank you!

singalen commented 3 years ago

Which is probably even more important – I don't see a way to put the process to sleep while waiting for the input event in input_hander.rs. Looks like it's bound to consume as much CPU as it can?

BiosElement commented 3 years ago

Hey @singalen, while we wait for @thebracket to take a look at this, I tried fix which, while not polished, isn't that awful. https://github.com/BiosElement/bracket-lib/commit/e8cd34112ade911bd6d1d460a209301b8e660e67 Currently the fps cap is hard-coded and the exit event doesn't fire correctly, but I think is just a matter of handling it properly in my game.

Basically just using winit's waituntil instead of polling, which means the sleep which adds input latency is removed. CPU usage and input latency both seem reasonable now.

thebracket commented 3 years ago

The approach in BiosElement's file is really good. I didn't know winit had a WaitUntil function! (I wonder if that was added after I wrote the initial loop?). I'm currently testing a fix based on this approach. I'll push a branch for this issue when I've finished testing locally; I don't have easy access to a Mac to check it with.

thebracket commented 3 years ago

On Windows, I'm getting some odd results - WaitUntil waits properly, unless an input event fires - and then the program zooms at full speed. So you can chug along at a limited FPS, and speed up by wiggling the mouse. Will keep debugging, but it may be a bit longer before the branch arrives. Seems to be related to https://github.com/rust-windowing/winit/issues/1610

thebracket commented 3 years ago

There's a potential fix in the branch https://github.com/amethyst/bracket-lib/tree/issue-228-high-cpu I ended up combining a few approaches:

I could use some testing on this, if you don't mind.

BiosElement commented 3 years ago

I'll be looking at this later today, but in response to the 'zooming' @thebracket , this is by design I believe. WaitUntil 'sleeps' until a certain time OR until an event triggers. So this could be considered 'good' in terms of responsiveness since I'd want the game to be responsive, but within that event trigger there should probably be tickrate limiting.

I did notice it capped at my Desktop's vsync, so this isn't the worst behavior, however things like animations would suffer from this rapidly changing fps.

thebracket commented 3 years ago

That's why I went with capping the render/update FPS and WaitUntil - it should still respond to messages rapidly, but without suddenly accelerating your game. It seems to work ok on a few tests I ran, but I imagine it'll need some tweaking.

BiosElement commented 3 years ago

There's a potential fix in the branch https://github.com/amethyst/bracket-lib/tree/issue-228-high-cpu I ended up combining a few approaches:

I could use some testing on this, if you don't mind.

I think your solution is a bit more elegant than what I came up with, however I am getting the following: thread 'main' panicked at 'attempt to subtract with overflow', C:\Users\William\Documents\Projects\bracket-lib-228\bracket-terminal\src\hal\native\mainloop.rs:136:129

Adding some debug prints, it's trying to subtract wait_time: 33 from next_tick: 83 and failing in spectacular fashion. Since there is already a check for next_tick >= wait_time, swapping the order resolves this panic.

From a few minutes of testing, the FPS is remarkably stable, however it misses almost every keyboard input using the keyboard example and my own project.

That's why I went with capping the render/update FPS and WaitUntil - it should still respond to messages rapidly, but without suddenly accelerating your game. It seems to work ok on a few tests I ran, but I imagine it'll need some tweaking.

Indeed that makes sense, apologies if I was stating the obvious! :)

BiosElement commented 3 years ago

Looking more closely, it looks like TICK_TYPE is still being set to Poll which only gets overridden with WaitUntil on the 'requested' Poll timeframe. This explains why the input is so delayed. I removed the control flow declaration on like line 181 entirely and that solved the input latency, however it looks like the next_tick variable is never getting reset meaning it gets slower and slower until a few frames in, nothing ticks without input.

singalen commented 3 years ago

Trying the latest issue-228-high-cpu. So far, CPU is still over 100%. ControlFlow::WaitUntil does get assigned, according to my debugger. If I add .with_fps_cap(30.0) to main console builder, then only a few first keystrokes gets processed after the game start, and then it freezes, with a high CPU. I'm trying to debug the change, but failing so far. I'll write a few questions on the commit: https://github.com/amethyst/bracket-lib/commit/60a127bda6c6fc8f4088f96415c86f25617369e1

BiosElement commented 3 years ago

After a bit of additional research, it also looks like Winit expects OnDrawRequested to be used for static GUI's while MainEventsCleared is intended for games that are redrawing regularly. There has been some discussion with Bevy and that's the path they are leaning towards anyway.

thebracket commented 3 years ago

I just pushed a couple of commits that work really well on my Windows system (I'll get to Linux later today - have publisher meetings). There are two major changes:

On Windows at least, I see CPU usage in the task manager remain constant while I zoom the window around or resize it like crazy.

singalen commented 3 years ago

On Mac, hello_terminal from the branch still runs at 93% CPU 8(

thebracket commented 3 years ago

I just pushed a commit that significantly changes the winit event loop. Testing is very positive on my Windows setup, and input lag seems improved.

If that doesn't do the trick, I'm going to have to find a Mac and a profiler.

singalen commented 3 years ago

Thank you so much for your help! Here's a profile from hello_terminal, if it helps. I personally cannot see why the loop runs so frequently. I added .with_fps_cap(30.0) to it.

Screen Shot 2021-10-21 at 11 14 27

  46  22399.0  hello_terminal (34275) :0
  45  22399.0  Main Thread  0x3fe788 :0
  44 libdyld.dylib 22399.0  start
  43 hello_terminal 22399.0  main
  42 hello_terminal 22399.0  std::rt::lang_start::h7d29aacc602424af /Users/vsergiienko/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:65
  41 hello_terminal 22399.0  std::rt::lang_start_internal::h0c37a46739a0311d /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/rt.rs:51
  40 hello_terminal 22399.0  std::panic::catch_unwind::h191bc002afc126a7 /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panic.rs:431
  39 hello_terminal 22399.0  std::panicking::try::ha45bc5eab1f103eb /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:343
  38 hello_terminal 22399.0  std::panicking::try::do_call::h51a4853c94b1bdea /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/std/src/panicking.rs:379
  37 hello_terminal 22399.0  core::ops::function::impls::_$LT$impl$u20$core..ops..function..FnOnce$LT$A$GT$$u20$for$u20$$RF$F$GT$::call_once::h3b22ce68aa2879c2 /rustc/9bc8c42bb2f19e745a63f3445f1ac248fb015e53/library/core/src/ops/function.rs:259
  36 hello_terminal 22399.0  std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::h5fc701173379e5e7 /Users/vsergiienko/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/rt.rs:66
  35 hello_terminal 22399.0  std::sys_common::backtrace::__rust_begin_short_backtrace::hc6b436e8e451fedf /Users/vsergiienko/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/std/src/sys_common/backtrace.rs:125
  34 hello_terminal 22399.0  core::ops::function::FnOnce::call_once::ha3beb17d36cecbeb /Users/vsergiienko/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:227
  33 hello_terminal 22399.0  hello_terminal::main::h52d6ae211d0f4e91 /Users/vsergiienko/src/amusement/bracket-lib/bracket-terminal/examples/hello_terminal.rs:106
  32 hello_terminal 22399.0  bracket_terminal::bterm::main_loop::hb257a2e5ed030ddc /Users/vsergiienko/src/amusement/bracket-lib/bracket-terminal/src/bterm.rs:1062
  31 hello_terminal 22399.0  bracket_terminal::hal::native::mainloop::main_loop::he0f41eabe10c855c /Users/vsergiienko/src/amusement/bracket-lib/bracket-terminal/src/hal/native/mainloop.rs:106
  30 hello_terminal 22399.0  winit::event_loop::EventLoop$LT$T$GT$::run::h29676a0a4cb1dd58 /Users/vsergiienko/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.24.0/src/event_loop.rs:154
  29 hello_terminal 22399.0  winit::platform_impl::platform::event_loop::EventLoop$LT$T$GT$::run::h280f4ebe2ab3700e /Users/vsergiienko/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.24.0/src/platform_impl/macos/event_loop.rs:93
  28 hello_terminal 22399.0  winit::platform_impl::platform::event_loop::EventLoop$LT$T$GT$::run_return::h65e0ecbf093392f1 /Users/vsergiienko/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.24.0/src/platform_impl/macos/event_loop.rs:106
  27 hello_terminal 22399.0  objc::message::send_message::h4d3d535e8f483d3d /Users/vsergiienko/.cargo/registry/src/github.com-1ecc6299db9ec823/objc-0.2.7/src/message/mod.rs:178
  26 hello_terminal 22399.0  objc::message::platform::send_unverified::h32b6afbb3ec1de3f /Users/vsergiienko/.cargo/registry/src/github.com-1ecc6299db9ec823/objc-0.2.7/src/message/apple/mod.rs:27
  25 hello_terminal 22399.0  _$LT$$LP$$RP$$u20$as$u20$objc..message..MessageArguments$GT$::invoke::he09246ba4dbeac19 /Users/vsergiienko/.cargo/registry/src/github.com-1ecc6299db9ec823/objc-0.2.7/src/message/mod.rs:128
  24 AppKit 22399.0  -[NSApplication run]
  23 AppKit 22399.0  -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:]
  22 AppKit 22399.0  _DPSNextEvent
  21 HIToolbox 22399.0  _BlockUntilNextEventMatchingListInModeWithFilter
  20 HIToolbox 22399.0  ReceiveNextEventCommon
  19 HIToolbox 22399.0  RunCurrentEventLoopInMode
  18 CoreFoundation 22399.0  CFRunLoopRunSpecific
  17 CoreFoundation 22367.0  __CFRunLoopRun
  16 CoreFoundation 13098.0  __CFRunLoopDoObservers
  15 CoreFoundation 12562.0  __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__
  14 hello_terminal 11622.0  winit::platform_impl::platform::observer::control_flow_end_handler::h33263b830f63805c /Users/vsergiienko/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.24.0/src/platform_impl/macos/observer.rs:135
  13 hello_terminal 11596.0  winit::platform_impl::platform::app_state::AppState::cleared::he5339b1983da3475 /Users/vsergiienko/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.24.0/src/platform_impl/macos/app_state.rs:331
  12 hello_terminal 8783.0  winit::platform_impl::platform::app_state::Handler::handle_nonuser_event::h34f85d3c91df5522 /Users/vsergiienko/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.24.0/src/platform_impl/macos/app_state.rs:173
  11 hello_terminal 8347.0  _$LT$winit..platform_impl..platform..app_state..EventLoopHandler$LT$T$GT$$u20$as$u20$winit..platform_impl..platform..app_state..EventHandler$GT$::handle_nonuser_event::hdc699a8d8b2f16f3 /Users/vsergiienko/.cargo/registry/src/github.com-1ecc6299db9ec823/winit-0.24.0/src/platform_impl/macos/app_state.rs:71
  10 hello_terminal 8183.0  _$LT$alloc..boxed..Box$LT$F$C$A$GT$$u20$as$u20$core..ops..function..FnMut$LT$Args$GT$$GT$::call_mut::h56b585e7e8707bb0 /Users/vsergiienko/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/alloc/src/boxed.rs:1553
   9 hello_terminal 8142.0  bracket_terminal::hal::native::mainloop::main_loop::_$u7b$$u7b$closure$u7d$$u7d$::h3968b3924eccdd39 /Users/vsergiienko/src/amusement/bracket-lib/bracket-terminal/src/hal/native/mainloop.rs:134
   8 hello_terminal 7539.0  bracket_terminal::hal::native::mainloop::tock::he668db854cec452c /Users/vsergiienko/src/amusement/bracket-lib/bracket-terminal/src/hal/native/mainloop.rs:295
   7 hello_terminal 7132.0  bracket_terminal::hal::gl_common::backing::shared_main_loop::rebuild_consoles::h3c083486b0360b5b /Users/vsergiienko/src/amusement/bracket-lib/bracket-terminal/src/hal/gl_common/backing/shared_main_loop.rs:70
   6 hello_terminal 7113.0  bracket_terminal::hal::gl_common::backing::simple_console_backing::SimpleConsoleBackend::rebuild_vertices::h085fd2d051e8643b /Users/vsergiienko/src/amusement/bracket-lib/bracket-terminal/src/hal/gl_common/backing/simple_console_backing.rs:136
   5 hello_terminal 5998.0  bracket_terminal::hal::gl_common::backing::simple_console_backing::SimpleConsoleBackend::push_point::h563443ab4fded306 /Users/vsergiienko/src/amusement/bracket-lib/bracket-terminal/src/hal/gl_common/backing/simple_console_backing.rs:70
   4 hello_terminal 5148.0  _$LT$alloc..vec..Vec$LT$T$C$A$GT$$u20$as$u20$core..ops..index..IndexMut$LT$I$GT$$GT$::index_mut::hef6550cf91188e38 /Users/vsergiienko/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:2393
   3 hello_terminal 3826.0  _$LT$alloc..vec..Vec$LT$T$C$A$GT$$u20$as$u20$core..ops..deref..DerefMut$GT$::deref_mut::h68ef25ce6e1a2c3c /Users/vsergiienko/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:2327
   2 hello_terminal 2198.0  alloc::vec::Vec$LT$T$C$A$GT$::as_mut_ptr::hb478dcefa0cc760e /Users/vsergiienko/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:1135
   1 hello_terminal 1223.0  core::ptr::mut_ptr::_$LT$impl$u20$$BP$mut$u20$T$GT$::is_null::h5fee71d7c4a78de8 /Users/vsergiienko/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/library/core/src/ptr/mut_ptr.rs:39
thebracket commented 3 years ago

I'm beginning to think that winit timers are just plain broken. As a test, I set control flow to Now, and changed the event loop to only run on Event::NewEvents(StartCause::ResumeTimeReached{..}). I think printed "tick" to the console when it runs, and "tock" when sufficient time has elapsed to run the frame. It then never changes control_flow.

The result is kinda alarming:

Tick
Tick
Tick
Tick
Tick
Tick
Tick
Tick
Tick
Tick
Tick
Tock

In other words, even though I only set the timer once - it ran over and over again as fast as it could! Peeking at requested_resume on the event shows the current time, even though I never requested a resume. That seems to go against what the winit docs have to say, which guarantee that it will be later than the requested time.

thebracket commented 3 years ago

Even stranger, moving the control flow command to the top and using RedrawEventsCleared seems to honor the display. I just went with a short delay, but it dramatically cut CPU usage on my Windows box.

singalen commented 3 years ago

With the latest change, hello_terminal is down to 25% CPU, which is a great progress for Mac :) It should be possible to write a model example that tests winit timers alone? I'm available to run a test on Mac.

thebracket commented 3 years ago

It definitely sounds like a good idea to make a simpler testbed, but that'll have to wait a bit - I have limited time. What FPS does hello_terminal example show you? On Windows, with the delay at the full wait_time I'm seeing it fluctuate between 23 and 26 (target is 30), so it's sleeping longer than it should.

singalen commented 3 years ago

When I cap it to 30, it's similar, 21-25. Unrestricted, it's 49-60FPS, and CPU is around 70%, which is still better than it was.

thebracket commented 3 years ago

Thank you. So it's behaving similarly - that means I can do the hard part of trying to make the numbers right. The uncapped CPU is going to be higher than it should, the library itself is quite inefficient at repeatedly redrawing the whole console. That part is on my "for the future" list.

BiosElement commented 3 years ago

When capped at 30, I'm seeing it around 20-21 FPS, however when rapid keystrokes or mouse movements occur, it locks at 30, sometimes 31. So there's certainly progress here... I'm also trying to tear about Winit to see if something is broken there as well.

thebracket commented 3 years ago

The latest commit takes a different approach.

My testing showed that WaitUntil with a 1ms delay was almost instant, and a 2ms delay could be as long as 30ms on Windows 10! I did some reading, and this article pointed me to the underlying issue: the Windows default timer really isn't very accurate at all. With a resolution of 16ms, it's not going to give me the kind of control required for a steady frame rate. Testing on Linux showed that it tended to be really accurate on timings, so I left the WaitUntil logic in place - I may have to conditionally compile one path on Windows and one on everything else.

So I grabbed the spin_sleep crate (it's tiny, so not too worried about dependency size) and it's dead on the money on Windows. CPU usage without sleeping is around 10% for hello_terminal (way too high), around 1.5% with a spinning sleep in the loop - and 29-30FPS for both. I tested most examples, and walking and walking_with_is_pressed are both better - the latter still being the better way to achieve low latency inputs.

I then went and set the default FPS cap to 30, and adjusted the benchmark examples to request 120 instead - so they still go really fast.

I'm hoping this will work as well on other platforms. The timer precision explains why my original code (which had a thread sleep) was so flakey.

thebracket commented 3 years ago

Happy to report that this version is also running very well in a virtual Ubuntu (what I have handy). CPU usage is way down, and frame-rate is about right (not perfect, but my VirtualBox setup on this PC is also mind-numbingly slow). My virtual hackintosh for testing is refusing to boot at all, so untested there. I really need to pickup a Mac Mini or something for testing.

BiosElement commented 3 years ago

Apologies I've been so quiet, work has been hectic but I've been following along and doing research/testing fix ideas on my own as well with little success.

I can say the latest 228 looks to be locked at 30 for hello_terminal, but drops 1-2 fps during mouse movement/input, so nearly perfect. Also, the input_harness is locked at about 30, but also handling keyboard inputs without noticeable latency.

Finally, and almost as importantly as the input latency, CPU usage is around 2%, down from 10% previously, so great work and thank you so much for helping with this mind-bending issue @thebracket @singalen ! 👍

thebracket commented 3 years ago

Thank you! (And don't worry about being quiet; I have long periods of work keeping me busy). I'm not sure what I can about the small drop from input (downside of sleeping, the whole input thread is waiting - so it doesn't know to wake up for events and processes them all at once when tick-time arrives) - so I'll call this good enough for now and start getting it merged into the mainline. I have quite a few other issues I want to look at before rolling out the next crate version.

Thanks for all your help, glad we've got the CPU under control!

thebracket commented 3 years ago

Ok, this is merged and passing all the tests I threw at it. Most importantly, the Hands-on Rust and roguelike tutorials work with it. In particular, the final dungeon crawler example from Hands-on Rust drops CPU usage from 12% (I have 8 cores, so basically pegging one core) to 1.5%.

I'll go ahead and call this a win and close the issue. :-)