emoon / rust_minifb

Cross platfrom window and framebuffer crate for Rust
MIT License
1.01k stars 97 forks source link

MacOS performance and multithreading #231

Closed xixixao closed 3 years ago

xixixao commented 3 years ago

Hi,

I'm trying to use multiple threads to render different parts of the window.

Right now I'm rendering into a shared buffer for a 300x300 pixels window, then updating the window once with that buffer.

On MacOS with a 2019 Macbook Pro 2.4 GHz 8-Core Intel Core i9 I'm seeing the update

window
        .update_with_buffer(buf, window_width, window_height)
        .unwrap();

taking initially 100-200 micro seconds. But as I fill up the buffer (replacing 0s with actual colors) the update takes up to 32ms - I can't even maintain 60fps.

I suspect I'm doing something (or many things) wrong. When the update starts taking so long I am not even updating the buffer with anything. I wonder:

  1. Should I only update the window with the smallest buffer and pixel range that is actually changing for better performance?
  2. Should I share reference to the window itself between my threads, and call update from each thread? Would that be a perf boost?

Would be great to have a bit more info about performance and mutli-threading on this repo.

Thank you for the awesome crate!

emoon commented 3 years ago

Hi and thanks! :)

Just making sure when testing the performance do you run with cargo --release ? default is debug mode which is much slower.

minifb is only designed to be called from the main thread and doing update once, so if you have threads generating new data it's best they do it in such way only one buffer needs to be sent to update_with_buffer

xixixao commented 3 years ago

Ok release is incredible faster, thank you for that tip :)

But I'm still running into the same issue. Update takes 33ms - 34ms when I get to that situation.

Perhaps it's update_with_buffer sleeping? But I do have window.limit_update_rate(None); (as I'm maintaining framerate by sleeping on my own). I checked the code and this should indeed skip the sleeping update function. So I'm still bewildered why it would take so long :/

emoon commented 3 years ago

Just to double check, can you do something like this:

use std::time::{Duration, Instant};

fn main() {
    ... 
    while window.is_open() ... {
       let start = Instant::now();
       window.update_with_buffer(...);
       let duration = start.elapsed();
       println!("Time for update_with_buffer is: {:?}", duration);
    }
}

And report what time you are seeing here, if it's really big (i.e several ms) it may be a good thing to try to use a profiler (such as Instruments that is included with XCode) to see where the time is being spent.

xixixao commented 3 years ago

That's exactly how I'm measuring the time. I'll try to get you a minimal repro.

xixixao commented 3 years ago

Well, I'm not having much luck simplifying this, but I have kinda found a work around.

In my loop, I also have a sleep() call to let the other threads besides the main one get a chance to do their work and acquire the lock on the buffer. The sleep() was set to 0 if the main thread took longer than my expected FPS (basically every 1/60 seconds I wanted to make sure the main thread wakes up, regardless of how long the main thread takes to do its job). When I did this the main thread eventually got to the place where the update_with_buffer takes 33 or 16ms, leading to 0 sleep, and this repeats. The fix is to always force the main thread to sleep. Then update_with_buffer stays fast.

🤷

emoon commented 3 years ago

This sound strange to me, the way I would design this loop is something like this

  start_work_on_threads(temp_buffer);
  while window.open() .. {
    buffer = sync_thread_work_to_buffer();
    start_work_on_threads(temp_buffer);
    window.update_with_buffer(buffer);
  }

Now the threads will work in the background while update_with_buffer is running, update_buffer will also sleep (if setup as in the examples) here to sync to 60 Hz, if sync_thread_work_to_buffer() ends up being slow (i.e the thread workers takes long time) the sleep time should be reduced and in worse case it will drop frame-rate (no sleeping)

xixixao commented 3 years ago

The thing is, I need to acquire a lock on the buffer. This is what I think happens:

1) The main thread "hiccups" - for whatever reason the update takes mss instead of µss. 2) This starves the worker threads, as they can't acquire the buffer lock 3) When the main thread gives up control, the worker threads acquire the lock one by one 4) But this makes the main thread take longer to acquire the lock for itself 5) The longer the main thread takes (counting both acquiring the lock and updating the screen), the less it sleeps, and the more starved the buffer lock becomes

Eventually the main thread doesn't sleep at all, and essentially deadlocks the buffer.

I originally designed my loop exactly the same way you have it in UpdateRate. The code is identical, so both exhibit this.

Always sleeping manually works. As you're suggesting sync_thread_work_to_buffer would also work, if I knew how to do that :) also works (in my case using a channel to notify the main thread about whether one of the worker threads updated the buffer, and if not I only update the window).

xixixao commented 3 years ago

The hiccups are noticeable even when running noise example for example, btw.