colby-swandale / waterfoul

Gameboy emulator written in Ruby-lang
92 stars 7 forks source link

Speed improvements #4

Closed eregon closed 7 years ago

eregon commented 7 years ago

This PR improves the speed of the emulator on Tetris from 8 FPS to 30 FPS (still some way to go for 60).

I used stackprof to identity the hotspots. The sample profiler is now enabled by passing --stackprof to exe/waterfoul start.

The emulator now also displays the FPS in the window title.

The initial stackprof report was:

==================================
  Mode: cpu(1000)
  Samples: 119330 (0.01% miss rate)
  GC: 3109 (2.61%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
     90270  (75.6%)       84166  (70.5%)     Waterfoul::MMU#[]
    129160 (108.2%)        7744   (6.5%)     Waterfoul::PPU#render_bg
      3912   (3.3%)        3912   (3.3%)     Waterfoul::Cartridge#[]
      2911   (2.4%)        2911   (2.4%)     Waterfoul::PPU#rgb
      1736   (1.5%)        1736   (1.5%)     Waterfoul::MBC::ROM#[]
      1850   (1.6%)        1584   (1.3%)     Waterfoul::MMU#[]=
     14277  (12.0%)        1474   (1.2%)     Waterfoul::Interrupt.pending_interrupt
      1177   (1.0%)        1177   (1.0%)     Waterfoul::CPU#reset_tick

So MMU#[] was clearly the bottleneck.

After these commits it looks like:

==================================
  Mode: cpu(1000)
  Samples: 35996 (0.01% miss rate)
  GC: 386 (1.07%)
==================================
     TOTAL    (pct)     SAMPLES    (pct)     FRAME
      5177  (14.4%)        5177  (14.4%)     Waterfoul::MMU#read_memory_byte
      7575  (21.0%)        4854  (13.5%)     Waterfoul::MMU#[]
      9465  (26.3%)        4307  (12.0%)     Waterfoul::PPU#render_bg
      3172   (8.8%)        3172   (8.8%)     Waterfoul::PPU#rgb
      2627   (7.3%)        2627   (7.3%)     Waterfoul::MBC::ROM#[]
      3166   (8.8%)        1770   (4.9%)     Waterfoul::Interrupt.pending_interrupt
      1830   (5.1%)        1598   (4.4%)     Waterfoul::MMU#[]=
      1121   (3.1%)        1121   (3.1%)     Waterfoul::CPU#reset_tick
     15416  (42.8%)        1112   (3.1%)     Waterfoul::PPU#step
       838   (2.3%)         838   (2.3%)     Waterfoul::CPU#instruction_cycle_time
      2379   (6.6%)         823   (2.3%)     Waterfoul::Timer#tick
      8400  (23.3%)         821   (2.3%)     Waterfoul::CPU#perform_instruction
       743   (2.1%)         743   (2.1%)     Waterfoul::Instructions::Registers#set_z_flag
       732   (2.0%)         732   (2.0%)     Waterfoul::Instructions::Registers#reset_flags

It is less clear, but PPU#render_bg seems fairly slow compared to the rest. Maybe some higher-level optimization could be done such as pre-computing a recurring pattern once, or if the background is a uniform color that might be easy to detect.

colby-swandale commented 7 years ago

Holy wow this is amazing, i can actually play a reasonable game of Tetris. This also really speeds up Mario and Pokemon as well.

I'm going to be looking at this over the next few days and probably adding a things to this as well.

Thanks again!!! 🎉

colby-swandale commented 7 years ago

sorry for the delay, i'll probably spend this weekend going through this PR. 😄

colby-swandale commented 7 years ago

Sorry for the delay again, i'm pretty happy with this merge. Thanks again!

eregon commented 7 years ago

You're welcome, thanks for merging!