6502: Self Modifying Instruction, End of Address Space

pjsoberoi commented 11 months ago

I'm in the process of verifying Ghidra's 6502 processor module vs processor unit tests (https://github.com/TomHarte/ProcessorTests/tree/main/6502/v1). For opcodes 0x00-0x40 I have three unit tests failing that I cannot fix easily. I need help deciding what the correct behavior is. Perhaps we need to run it on a real CPU?

1) JSR $1355 (opcode: 20 55 13) PC is 4949, should be 341 20.json 4046 { "name": "20 55 13", "initial": { "pc": 379, "s": 125, "a": 158, "x": 137, "y": 52, "p": 230, "ram": [ [379, 32], [380, 85], [381, 19], [341, 173]]}, "final": { "pc": 341, "s": 123, "a": 158, "x": 137, "y": 52, "p": 230, "ram": [ [341, 173], [379, 32], [380, 125], [381, 1]]}, "cycles": [ [379, 32, "read"], [380, 85, "read"], [381, 19, "read"], [381, 1, "write"], [380, 125, "write"], [381, 1, "read"]] },

The issue here is the instruction is overwriting itself as it's executing. pc is 379 and we are writing bytes 381 and 380, and then reading byte 381. I don't know what the correct behavior is supposed to be.

2) ROL $7D (opcode: 26 7d ff) P is 228, should be 101 26.json 1251 { "name": "26 7d ff", "initial": { "pc": 65535, "s": 137, "a": 219, "x": 194, "y": 144, "p": 100, "ram": [ [65535, 38], [0, 125], [1, 255], [125, 160]]}, "final": { "pc": 1, "s": 137, "a": 219, "x": 194, "y": 144, "p": 101, "ram": [ [0, 125], [1, 255], [125, 64], [65535, 38]]}, "cycles": [ [65535, 38, "read"], [0, 125, "read"], [125, 160, "read"], [125, 160, "write"], [125, 64, "write"]] },

The issue here is that PC is at the end of the address space when starting the instruction. Again I don't know what the correct behavior is supposed to be when accessing past 0xffff. Wrap to zero? Illegal instruction?

3) AND $45EF,X (opcode: 3d ef 45) A is 0, should be 144 P is 103, should be 229 { "name": "3d ef 45", "initial": { "pc": 65535, "s": 99, "a": 184, "x": 115, "y": 124, "p": 101, "ram": [ [65535, 61], [0, 239], [1, 69], [17762, 235], [18018, 209], [2, 20]]}, "final": { "pc": 2, "s": 99, "a": 144, "x": 115, "y": 124, "p": 229, "ram": [ [0, 239], [1, 69], [2, 20], [17762, 235], [18018, 209], [65535, 61]]}, "cycles": [ [65535, 61, "read"], [0, 239, "read"], [1, 69, "read"], [17762, 235, "read"], [18018, 209, "read"]] },

Same issue as 2), PC is at the end of the address space when starting the instruction.

GhidorahRex commented 11 months ago

Without having a real processor to test on, I don't think it's possible to understand exactly how a self-modifying instruction will behave. This is a very distinct edge case.

c64cryptoboy commented 10 months ago

With respect to the 1st test issue:

The json test data shows exactly what happens on my Commodore 64 hardware (it has a 6510 chip, like your NES RP2A03 except for ADC/SBC BCD handling).

Here's an equivalent (to the JSON) test on my Commodore 64 hardware (note: A, X, and Y test values don't matter):

$0155  EE 20 D0  INC $D020 ; loop to change border color to see if we got here
$0158  4C 55 01  JMP $0155
. . .
; start execution here:
$0177  78        SEI       ; protect stack during test
$0178  A2 7D     LDX #$7D  ; set stack pointer
$017A  9A        TXS
$017B  20 55 13  JSR $1355 ; the json test line, it will set PC to $0155

When $017B (20 55 13) JSR $1355 executes, here's what happens each cycle of the instruction: 1) read $20 (JSR) then inc PC to $017C 2) read ADL from $017C as $55, then inc PC to $017D 3) internal stuff 4) overwrite $017D's $13 with PCH $01 then dec S 5) overwrite $017C's $55 with PCL $7D then dec S 6) read ADH from $017D as $01, set PC with ADH/ADL as $0155

So, half of the address gets overwritten as the instruction is in the process of executing. The json after-test data is correct (the PC, S, and RAM). Pretty interesting (and thanks to Robin Harbron who gave me an important nudge forward on this one).

Test code: weirdJSR1

Ends up in the infinite border-color-changing loop at $0155: weirdJSR2

c64cryptoboy commented 10 months ago

Peter Ferrie just tested your 1st example on an Apple II (6502, not a 65C02), and it also matches the json test result, so again, the test is valid.

My opinion is that it's not important that Ghidra emulates this correctly. That said, it should be possible to modify the JSR pcode to accommodate this test case by ordering assignments in a way that matches the 6502 assignment order; but if so, I'd put a comment in for future maintainers. Here's two references on what happens at each JSR cycle:

The MCS6500 Microcomputer Family Programming Manual (January 1976), bookmarked at JSR: https://archive.org/details/6500-50a_mcs6500pgmmanjan76/page/n121/mode/1up?view=theater
Documentation on what the 6502 VICE emulator does at each cycle, bookmarked at JSR: https://sourceforge.net/p/vice-emu/code/HEAD/tree/techdocs/CPU/65xxand85xx.txt#l985

c64cryptoboy commented 10 months ago

On to the 2nd test case: I confirmed that the stated json test behavior is what happens on real hardware (a Commodore 64); specifically, the two-byte instruction spans $FFFF to $0000 and executes successfully, the memory location holding $A0/%10100000 is ROLed to $40/01000000 (setting carry to 1), and P goes from 100/$64/%01100100 to 101/$65/%01100101.

Test details:

On a C64, memory locations $0000-$0001 are readable/writable memory-mapped registers. I set memory location $0001 to $35 to bank out the ROM that would prevent location $FFFF from being changed. Also, changing the value at $0000 would mess up the ROM banking, so using its default value of $2F instead of assigning it the test value of $7D. This isn't an issue, as $7D was an arbitrarily-chosen memory location to apply the ROL on. So the shifted result will show in location $002F instead of $007D.

Execution begins at $FFFF, the instruction there wrapping to $0000. Flow will continue through memory location $0001, executing an instruction based on the value $35 that I put into location 1 for banking. This forms an instruction that will not affect testing state results: $0001 15 00 ORA $00,X ; pseudo NOP, won't change our flags

To end in an infinite loop (that we can break out of to check value at $2F), I assembled this line: $0003 4C 03 00 JMP $0003

Test code:

$C000  78          SEI
$C001  A9 35       LDA #$35 
$C003  85 01       STA $01 ; bank out ROM so we can change loc $FFFF
$C005  A9 26       LDA #$26
$C007  8D FF FF    STA $FFFF ; create ROL $2D at $FFFF
$C00A  A9 A0       LDA #$A0
$C00C  85 2F       STA $2F   ; init $2F = A0 (using $2F instead of $7D)
$C00E  A2 00       LDX #$00
    ; create an ORA $00,X that leaves flags (%01100101) undisturbed 
$C010  86 02       STX $02   
$C012  A9 64       LDA #$64
$C014  48          PHA
$C015  28          PLP       ; init flags
$C016  4C FF FF    JMP $FFFF

Setup: instWrap1

Execution wrapped from high to low memory, then after breaking out of infinite loop at $0003, both RAM and status flags match test expectations: instWrap2

So (without testing) I assume your third test case would be valid as well (since it's just another memory-wrapping instruction example)?

ryanmkurtz commented 10 months ago

I'm really enjoying these pictures.

GhidorahRex commented 10 months ago

This isn't really a huge deal for us. Self-modifying instructions aren't really something that occurs regularly. That being said, if you can get it working right, you're welcome to submit a pull-request for it and we can evaluate it then.

pjsoberoi commented 10 months ago

I think I have a way forward on the self-modifying instruction. That being said I'm afraid of how many other instructions this might involve. Currently this is the only unit test failing due to self-modifying.

c64cryptoboy commented 10 months ago

@pjsoberoi , I agree with GhidorahRex, an instruction that modifies itself is pretty rare in practice. So I think if you do decide to handle this one test case in sleigh, don't feel indebted to providing additional per-cycle instruction refinements.

That said, multiple-instruction self-modifying code in 6502 is common. But your NES tests shouldn't run into this if they're just testing single instructions individually? (And NES code in ROM cartridges can't use self-modifying code).

If you do emulate self-modifying code, you may run into Ghidra issues. In my experience, getReferencesFrom(program_counter) can break when the emulated control flow passes through that modified code (causing a python script to just "SystemExit" without a log error message). If more people are running into this, I can easily whip up an issue that makes this reproducible.

Why is 6502 self-modifying code common? It's frequently used in 16-bit indexing. Here's an example: clear 8K of RAM from $2000 to $3FFF. First, without self-modifying code, this uses zero-page-based indirect addressing for 16-bit indexing:

      LDA #$00
      TAY
      STA $FB
      LDX #$20
      STX $FC
PT1:  STA ($FB),Y
      INY
      BNE PT1
      INC $FC
      DEX
      BNE PT1

In contrast, this self-modifying alternative runs much faster (and doesn't require zero page):

      LDA #$00
      TAY
      LDX #$20
      STX PT1+2
PT1:  STA $2000,Y
      INY
      BNE PT1
      INC PT1+2
      DEX
      BNE PT1

ghidra007 commented 7 months ago

@pjsoberoi Can you supply a stack trace?

pjsoberoi commented 2 months ago

@ghidra007: I apologize for the delay, I lost track of this issue. Here is an example unit test:

[*] /home/poberoi/ProcessorTests/6502/v1/20.json: Loaded 10000 test cases
[*] Test Range: 0-10000
[*] Done posting tests
Test cases: 836/10000 Fail cases: 0
Test cases: 1652/10000 Fail cases: 0
Test cases: 2549/10000 Fail cases: 0
Test cases: 3449/10000 Fail cases: 0
!! REGISTER ERROR: PC 341 4949
[-] 4044) FAIL
Initial State:
    Registers:
        A: 158
        P: 230
        PC: 379
        S: 125
        X: 137
        Y: 52
    RAM:
        341: 173
        379: 32
        380: 85
        381: 19

Final (Expected) State:
    Registers:
        A: 158
        P: 230
        PC: 341
        S: 123
        X: 137
        Y: 52
    RAM:
        341: 173
        379: 32
        380: 125
        381: 1

Emulator:
    Registers:
        A: 158
        P: 230
        PC: 4949
        S: 123
        X: 137
        Y: 52
    RAM:
        341: 173
        379: 32
        380: 125
        381: 1

The program counter if overwriting itself.

Can you tell me how I can provide a stack trace? I have my written a standalone executable (https://github.com/oberoisecurity/ghidra-processor-module-verifier) that takes in a .sla and runs unit tests with the SLEIGH emulator. This was one of the tests that failed. Thanks again.

GhidorahRex commented 2 months ago

@pjsoberoi I don't think we need a stack trace. I think if we want solid emulation of this behavior we may have to utilize the trick from the Sparc about differentiating emulation vs decompiled output.

NationalSecurityAgency / ghidra

6502: Self Modifying Instruction, End of Address Space #5871