EvilJagaGenius / jagoombacolor

Jaga's Goomba Color fork
105 stars 9 forks source link

HDMA discussion #3

Open ghost opened 2 years ago

ghost commented 2 years ago

Checked out the latest code branch.

  1. I was misinformed by a document. HDMA always transfers 16 bytes per line (checked bgb). It just takes ""1/2 the time"".

  2. I think the branch is backwards.

    r0 = 0xff ==> tst r0, #0x80 ==> not zero (hdma)
    r0 = 0x00 ==> tst r0, #0x80 ==> zero (normal dma)

==>

FF55_W: @HDMA5
    tst r0,#0x80
@   bxne lr
    bic r0,r0,#0x80
    beq general_dma

    @ HDMA code goes below
    mov r1,#0xFF
    ldr r2,=_doing_hdma
    strb r1,[r2]

    add r1,r0,#1
    ldr r2,=_dma_blocks_remaining
    strb r1,[r2]

    @ If we're doing HDMA code, I don't think we want to fall through here
    bx lr

general_dma:

Shantae sprites are still partly bugged on my end though.

EDIT: I still could be wrong and also investigating.

EvilJagaGenius commented 2 years ago

Donkey Kong Country might also be a good test for this. When I make this change, the green-wireframe Rare intro animates correctly, though it displays garbage on the left side of the screen and other screens get corrupted. It also becomes very sluggish at intervals... maybe that's the overhead of calling DoDma() every scanline.

Also worth noting, in Pokemon Crystal the screen doesn't corrupt in the intro with Prof. Oak, but it does when you open a textbox in the overworld.

EvilJagaGenius commented 2 years ago

Basic idea for a new HDMA system. io.s: FF55_W: HDMA call comes in. Set _dma_blocks_remaining and _dma_blocks_total (replace _doing_hdma) lcd.s: entermode0: Decrement _dma_blocks_remaining. If _dma_blocks_remaining == 0, DoDma(_dma_blocks_total << 4) Back in io.s: FF55_W: Cancel HDMA transfer. Call DoDma((_dma_blocks_total - _dma_blocks_remaining) << 4)

Would be lighter on DoDma() calls while responding to games like Crystal canceling transfers.

ghost commented 2 years ago

EDIT: Saw new commit. Rebasing.


Also interesting is that "entermode0" only seems to run 1-2 times per frame total (checked through no$gba debugger). Had to modify timeout.s next:

default_scanlinehook:
checkScanlineIRQ:
default_scanlinehook_nohblank:
    mov r0,#16
    ldr r1,=_doing_hdma
    ldr r1,[r1]
    cmp r1,#0xFF

    stmfd sp!,{r3,r8-r12,r14,lr}
    blxeq_long DoDma  @ Call DoDma if we're doing HDMA
    ldmfd sp!,{r3,r8-r12,r14,lr}

_checkScanlineIRQ:
    tst cycles,#CYC_LCD_ENABLED

Speed-wise, I suppose hdma creates too many small gba transfers and overloads the time window, creating lag? Hdma is more annoying than I expected.


Rare wireframe intro is drawn using plain dma transfers during vblank; uses hdma to clear the screen before it. So I'm really curious why it was broken before.


Beginning to wonder what "entermode0" really does.


I think DKC soft locks because hdma is taking too long to finish so there must be something more to it than I know. Have to check the LYC counter.

ghost commented 2 years ago

DKC Rare Logo garbage -- GBC tilemap looks okay. So I think those 4 right-side tiles are not being marked dirty and treated on GBA tilemap side (0000 instead of 0058).

ghost commented 2 years ago
FF55_R: @HDMA5
    ldrb_ r0,dma_blocks_remaining
    ldrb_ r1,doing_hdma
    cmp r1,#0xFF
    subeq r0,r0,#1  @ If hdma, subtract 1
    mov pc,lr

DKC responds better to this fix.


Shantae has a problem with hdma per line via default_scanlinehook. But DKC likes it. Mmmmmm....

ghost commented 2 years ago
io.s

@r0 = dest, r1 = src, r2 = byteCount, r3 = dirtyMapBits
    global_func copy_map_and_compare
copy_map_and_compare:
    stmfd sp!,{r4,r5,r6,r7,r8,r9,r10,r11}

cmc_loop1_left:
    mov r12,#0
    tst r0,#0x10
    bne cmc_loop1_right

    ldmia r0!,{r4,r5,r6,r7}
    ldmia r1!,{r8,r9,r10,r11}
    eors r4,r4,r8
    strne r8,[r0,#-16]
    orrne r12,r12,#0x01
    eors r5,r5,r9
    strne r9,[r0,#-12]
    orrne r12,r12,#0x02
    eors r6,r6,r10
    strne r10,[r0,#-8]
    orrne r12,r12,#0x04
    eors r7,r7,r11
    strne r11,[r0,#-4]
    orrne r12,r12,#0x08

    subs r2,r2,#16
    beq cmc_loop1_exit

cmc_loop1_right:
    ldmia r0!,{r4,r5,r6,r7}
    ldmia r1!,{r8,r9,r10,r11}
    eors r4,r4,r8
    strne r8,[r0,#-16]
    orrne r12,r12,#0x10
    eors r5,r5,r9
    strne r9,[r0,#-12]
    orrne r12,r12,#0x20
    eors r6,r6,r10
    strne r10,[r0,#-8]
    orrne r12,r12,#0x40
    eors r7,r7,r11
    strne r11,[r0,#-4]
    orrne r12,r12,#0x80

cmc_loop1_exit:
    ldrb r4,[r3]
    orr r12,r12,r4
    strb r12,[r3],#1

    subs r2,r2,#16
    bmi cmc_part2
    bne cmc_loop1_left

cmc_part2:
    ldmfd sp!,{r4,r5,r6,r7,r8,r9,r10,r11}
    bx lr

    adds r2,r2,#16
    ldmlefd sp!,{r4,r5,r6,r7,r8,r9,r10,r11}
    bxle lr
    b_long _cmc_part2_
    .pushsection .text
_cmc_part2_:
    ble cmc_done
    mov r6,#1
cmc_loop2:
    ldr r4,[r0],#4
    ldr r5,[r1],#4
    eors r4,r4,r5
    strne r5,[r0,#-4]
    orrne r12,r12,r6
    mov r6,r6,lsl#1
    subs r2,r2,#4
    bgt cmc_loop2
    ldrb r4,[r3]
    orr r12,r12,r4
    strb r12,[r3],#1
cmc_done:
    ldmfd sp!,{r4,r5,r6,r7,r8,r9,r10,r11}
    bx lr
    .popsection

Updates dirty tilemaps per 16 bytes (hdma). DKC now looks much better, but not perfect.

EvilJagaGenius commented 2 years ago

Could you take a look at the hdma2 branch and let me know what you think, if I'm on the right track there? DKC works, Crystal's busted, sprites in Shantae are flickery but recognizable.

ghost commented 2 years ago

It looks promising (!) and easier to follow but I'll have to do some debugging to check the internals; Goomba never behaves the way I'd expect it to.


For DKL Color, I think we need to "encodePC" inside _FF70W after the memmap switch (due to D000 bank). Haven't gotten it to work yet.

ghost commented 2 years ago

I understand your hdma 1-shot optimization now. Clever!

cancel_hdma:
    stmfd sp!,{r0-r4,lr}
    ldrb_ r0,dma_blocks_total
    ldrb_ r1,dma_blocks_remaining
    sub r0,r0,r1
    lsls r0,r0,#4
    blxne_long DoDma
    ldmfd sp!,{r0-r4,lr}

We need to reset flags (lsls), then Crystal works.

ghost commented 2 years ago

DKL New Colors - menu crash fix

@----------------------------------------------------------------------------
_FF70W:@        SVBK - CGB Mode Only - WRAM Bank
@----------------------------------------------------------------------------
...

    ldr r1,=wram_W
    str_ r1,writemem_tbl+52

wram_remap_pc:
    ldr_ r1,lastbank
    sub gb_pc,gb_pc,r1
    stmfd sp!,{r0}
    encodePC
    ldmfd sp!,{r0}

    mov pc,lr

select_gbc_ram:
...

    ldr r1,=wram_W_2
    str_ r1,writemem_tbl+52

    b wram_remap_pc

DKC title colors are wrong because of a Rare trick:

Goomba would have to constantly apply the updates per scanlines on GBA hardware for a proper fix

EvilJagaGenius commented 2 years ago

Hm. The DKC title colors seem like low priority to me, it's still readable and everything else looks fine. I'll push the Crystal and DKL fixes soon and get a new release out.

ghost commented 2 years ago

Low priority = yes, definitely.

I have an idea for DKC but not familiar with GBA hardware:

Then monitor the GBA scanlines as it renders the frame and update GBA palettes based on cached palette list.


I wonder why Shantae bugs out when hacks are disabled but that's not important; could be some ugly timing racing issue.

Otherwise I guess we can close this ticket and reopen later if needed.