deltabeard / Peanut-GB

A Game Boy (DMG) emulator single header library written in C99. Performance is prioritised over accuracy.
https://projects.deltabeard.com/peanutgb/
270 stars 35 forks source link

gb: add delay and memcpy detection #95

Open deltabeard opened 9 months ago

deltabeard commented 9 months ago

Add detection of busy loops, or loops that wait for a status. Such loops may include:

A use case is where the emulation speed can be increased because all ROMs perform at least one of the tasks written above in some cases. Instead of executing each task instruction by instruction, a look-ahead should be used to determine whether the task can be performed with high-level emulation instead (eg. calling memcpy() in peanut-gb).

deltabeard commented 9 months ago

Busy loops

.wait
    dec a
    jr nz, .wait

from https://github.com/pret/pokered/blob/30c244ae4f1acc6f018499ceaa9b138367d7bedf/engine/gfx/oam_dma.asm#L23C1-L25C14

memcpy routines

CopyData::
; Copy bc bytes from hl to de.
    ld a, [hli]
    ld [de], a
    inc de
    dec bc
    ld a, c
    or b
    jr nz, CopyData
    ret

from https://github.com/pret/pokered/blob/30c244ae4f1acc6f018499ceaa9b138367d7bedf/home/copy.asm#L15

Also:

; Copy bytes from one area to another.
; @param de: Source
; @param hl: Destination
; @param bc: Length
Memcopy:
    ld a, [de]
    ld [hli], a
    inc de
    dec bc
    ld a, b
    or a, c
    jp nz, Memcopy
    ret

from https://gbdev.io/gb-asm-tutorial/part2/functions.html

Another example that can only copy up to 255 bytes.

.loop
    ld a, [hli]
    ld [de], a
    inc de
    dec c
    jr nz, .loop
    ret

from https://github.com/pret/pokered/blob/30c244ae4f1acc6f018499ceaa9b138367d7bedf/home/vcopy.asm#L439

Also:

.loop_17:
    ld a, [de]
    ld [hl], a
    inc l
    inc e
    dec b
    jr nz, .loop_17

from https://github.com/alexsteb/tetris_disassembly/blob/b4bbceb3cc086121ab4fe9bf4dad6752fe956ec0/main.asm#L6260C1-L6266C17

Also:

COPY_TILES::
    ldi a, [hl]
    ld [de], a
    inc de
    dec bc
    ld a, b
    or c
    jr nz, COPY_TILES

from https://github.com/alexsteb/tetris_disassembly/blob/b4bbceb3cc086121ab4fe9bf4dad6752fe956ec0/main.asm#L6713C1-L6720C19

memset routines

.loop
    ld [hli], a
    dec c
    jr nz, .loop
    dec b
    jr nz, .loop
    jp Delay3

from https://github.com/pret/pokered/blob/30c244ae4f1acc6f018499ceaa9b138367d7bedf/home/copy2.asm#L180

Also:

.x
    ld [hli], a
    dec c
    jr nz, .x

from https://github.com/pret/pokered/blob/b302e93674f376f2881cbd931a698345ad27bec3/home/copy2.asm#L169C1-L172C11

.x
    ld [hli], a
    dec b
    jr nz, .x

strcpy routines

; copies a string from de to hl
CopyString::
    ld a, [de]
    inc de
    ld [hli], a
    cp "@"
    jr nz, CopyString
    ret

from https://github.com/pret/pokered/blob/30c244ae4f1acc6f018499ceaa9b138367d7bedf/home/copy_string.asm#L7

deltabeard commented 9 months ago

Possible elimination of register checks in loops?

.x:
    ld a, [$FF00+$41]
    and $03
    jr nz, .x
deltabeard commented 8 months ago

If the routine is in WRAM or HRAM, than the first instruction in the routine could be changed to an invalid instruction that could then be hijacked into a memcpy or memset instruction. The first bank of the ROM could be internally cached to allow Peanut-GB to replace such instructions, since that is likely where such common routines would be placed in a ROM. It might be best to only perform this check in the first bank of the ROM and only check for memcpy in HRAM (OAM copy) to reduce the impact of failed routing detection on each JR instruction.