deltabeard / Peanut-GB

A Game Boy (DMG) emulator single header library written in C99. Performance is prioritised over accuracy.
https://projects.deltabeard.com/peanutgb/
321 stars 41 forks source link

oam: improve DMA speed #59

Closed deltabeard closed 1 year ago

deltabeard commented 2 years ago

The address of the DMA transfer is always within a specific memory boundary. So if the first address is in WRAM, then the last address is too. We can use this to our advantage by obtaining the initial address to WRAM and incrementally copying data, instead of calling gb_read and gb_write for each copy.

We can assume that the OAM DMA will most likely copy from a shadow OAM area in WRAM to OAM. Since WRAM is managed by Peanut-GB, we can obtain a pointer and use copy the data without using gb_read. If the data is from the Cart ROM or Cart RAM (which is possible), then we cannot use a pointer without an API change.

Since OAM size is 0xA0 (160), we can perform the copy using 32-bit transfers instead of 8-bit, if the running platform uses little endian. This will result in only 40 transfers as opposed to 160.

deltabeard commented 2 years ago

Partially completed in 428e616c000483a3acb3edaf87bdcd845b153e4c. Reduced number of instructions when compiled for ARM Cortex M0+.

deltabeard commented 1 year ago

Work done on oam-optimisation branch. Need to benchmark as to whether this improves performance.

deltabeard commented 1 year ago

There is no noticeable improvement in performance. Benchmark suggests this branch is slightly slower. https://github.com/deltabeard/Peanut-GB/commits/oam-optimisation

deltabeard commented 1 year ago

Because this resulted in no noticeable performance improvement, this will be closed.