DaveTCode / GBADotnet

A C#/net core GBA emulator
MIT License
20 stars 1 forks source link

Atrocious performance caused by Timer/DMA controllers #48

Closed DaveTCode closed 2 years ago

DaveTCode commented 2 years ago

Whilst there are lots of micro optimisations across the application the fundamental architecture is slow.

image

shows the breakdown of CPU time spent during a frame. Notably it's not the ppu or cpu which are causing <60fps on my device. It's the timer controller and dma controller which are stepping every cycle and causing slowdowns.

Many emulators use a scheduler to get around this issue, instead of ticket components like the timer each cycle they specify when an event like reload will occur and simply skip through to then (caveat that it needs to handle cpu reads). That's certainly one option here although I'd like to see if I can improve performance in other ways first.

DaveTCode commented 2 years ago

image

Minor fix for timer controller to remove unnecessary allocations of a bool array each time it's ticked

DaveTCode commented 2 years ago

Minor fix for DMA controller to not need to check all channels on a write cycle brings perf down to <15% total image

DaveTCode commented 2 years ago

Starting to track adding a scheduler in a new branch https://github.com/DaveTCode/GBADotnet/tree/scheduler

DaveTCode commented 2 years ago

https://github.com/DaveTCode/GBADotnet/tree/scheduler/compatibility outlines performance with my laptop (surface book 3) before adding a scheduler

Generally it just about hits 60fps on startup screens but in quite a few cases the average is lower than 60 and its rare that it gets much higher.

Target is >100fps on all games I think

DaveTCode commented 2 years ago

image

After moving ppu and timers off the main clock function I get much higher performance (10fps average increase from ppu, 100fps increase from timers)

Remaining large ticket items for performance are looking at the dma controller, and off loading the ppu rendering routines to another thread. Be interesting to check that and see if it's quick enough. Just a question of how fast we can shuffle the bytes across between threads. Definitely don't want to mutex lock on vram/palette etc access!

DaveTCode commented 2 years ago

Scheduler branch merged into main, the timer is no longer a concern from a performance point of view so I'll close this ticket down and raise separate tickets for other performance enhancements.