Open pjsoberoi opened 11 months ago
Without having a real processor to test on, I don't think it's possible to understand exactly how a self-modifying instruction will behave. This is a very distinct edge case.
With respect to the 1st test issue:
The json test data shows exactly what happens on my Commodore 64 hardware (it has a 6510 chip, like your NES RP2A03 except for ADC/SBC BCD handling).
Here's an equivalent (to the JSON) test on my Commodore 64 hardware (note: A, X, and Y test values don't matter):
$0155 EE 20 D0 INC $D020 ; loop to change border color to see if we got here
$0158 4C 55 01 JMP $0155
. . .
; start execution here:
$0177 78 SEI ; protect stack during test
$0178 A2 7D LDX #$7D ; set stack pointer
$017A 9A TXS
$017B 20 55 13 JSR $1355 ; the json test line, it will set PC to $0155
When $017B (20 55 13) JSR $1355 executes, here's what happens each cycle of the instruction: 1) read $20 (JSR) then inc PC to $017C 2) read ADL from $017C as $55, then inc PC to $017D 3) internal stuff 4) overwrite $017D's $13 with PCH $01 then dec S 5) overwrite $017C's $55 with PCL $7D then dec S 6) read ADH from $017D as $01, set PC with ADH/ADL as $0155
So, half of the address gets overwritten as the instruction is in the process of executing. The json after-test data is correct (the PC, S, and RAM). Pretty interesting (and thanks to Robin Harbron who gave me an important nudge forward on this one).
Test code:
Ends up in the infinite border-color-changing loop at $0155:
Peter Ferrie just tested your 1st example on an Apple II (6502, not a 65C02), and it also matches the json test result, so again, the test is valid.
My opinion is that it's not important that Ghidra emulates this correctly. That said, it should be possible to modify the JSR pcode to accommodate this test case by ordering assignments in a way that matches the 6502 assignment order; but if so, I'd put a comment in for future maintainers. Here's two references on what happens at each JSR cycle:
On to the 2nd test case: I confirmed that the stated json test behavior is what happens on real hardware (a Commodore 64); specifically, the two-byte instruction spans $FFFF to $0000 and executes successfully, the memory location holding $A0/%10100000 is ROLed to $40/01000000 (setting carry to 1), and P goes from 100/$64/%01100100 to 101/$65/%01100101.
Test details:
On a C64, memory locations $0000-$0001 are readable/writable memory-mapped registers. I set memory location $0001 to $35 to bank out the ROM that would prevent location $FFFF from being changed. Also, changing the value at $0000 would mess up the ROM banking, so using its default value of $2F instead of assigning it the test value of $7D. This isn't an issue, as $7D was an arbitrarily-chosen memory location to apply the ROL on. So the shifted result will show in location $002F instead of $007D.
Execution begins at $FFFF, the instruction there wrapping to $0000. Flow will continue through memory location $0001, executing an instruction based on the value $35 that I put into location 1 for banking. This forms an instruction that will not affect testing state results:
$0001 15 00 ORA $00,X ; pseudo NOP, won't change our flags
To end in an infinite loop (that we can break out of to check value at $2F), I assembled this line:
$0003 4C 03 00 JMP $0003
Test code:
$C000 78 SEI
$C001 A9 35 LDA #$35
$C003 85 01 STA $01 ; bank out ROM so we can change loc $FFFF
$C005 A9 26 LDA #$26
$C007 8D FF FF STA $FFFF ; create ROL $2D at $FFFF
$C00A A9 A0 LDA #$A0
$C00C 85 2F STA $2F ; init $2F = A0 (using $2F instead of $7D)
$C00E A2 00 LDX #$00
; create an ORA $00,X that leaves flags (%01100101) undisturbed
$C010 86 02 STX $02
$C012 A9 64 LDA #$64
$C014 48 PHA
$C015 28 PLP ; init flags
$C016 4C FF FF JMP $FFFF
Setup:
Execution wrapped from high to low memory, then after breaking out of infinite loop at $0003, both RAM and status flags match test expectations:
So (without testing) I assume your third test case would be valid as well (since it's just another memory-wrapping instruction example)?
I'm really enjoying these pictures.
This isn't really a huge deal for us. Self-modifying instructions aren't really something that occurs regularly. That being said, if you can get it working right, you're welcome to submit a pull-request for it and we can evaluate it then.
I think I have a way forward on the self-modifying instruction. That being said I'm afraid of how many other instructions this might involve. Currently this is the only unit test failing due to self-modifying.
@pjsoberoi , I agree with GhidorahRex, an instruction that modifies itself is pretty rare in practice. So I think if you do decide to handle this one test case in sleigh, don't feel indebted to providing additional per-cycle instruction refinements.
That said, multiple-instruction self-modifying code in 6502 is common. But your NES tests shouldn't run into this if they're just testing single instructions individually? (And NES code in ROM cartridges can't use self-modifying code).
If you do emulate self-modifying code, you may run into Ghidra issues. In my experience, getReferencesFrom(program_counter) can break when the emulated control flow passes through that modified code (causing a python script to just "SystemExit" without a log error message). If more people are running into this, I can easily whip up an issue that makes this reproducible.
Why is 6502 self-modifying code common? It's frequently used in 16-bit indexing. Here's an example: clear 8K of RAM from $2000 to $3FFF. First, without self-modifying code, this uses zero-page-based indirect addressing for 16-bit indexing:
LDA #$00
TAY
STA $FB
LDX #$20
STX $FC
PT1: STA ($FB),Y
INY
BNE PT1
INC $FC
DEX
BNE PT1
In contrast, this self-modifying alternative runs much faster (and doesn't require zero page):
LDA #$00
TAY
LDX #$20
STX PT1+2
PT1: STA $2000,Y
INY
BNE PT1
INC PT1+2
DEX
BNE PT1
@pjsoberoi Can you supply a stack trace?
@ghidra007: I apologize for the delay, I lost track of this issue. Here is an example unit test:
[*] /home/poberoi/ProcessorTests/6502/v1/20.json: Loaded 10000 test cases
[*] Test Range: 0-10000
[*] Done posting tests
Test cases: 836/10000 Fail cases: 0
Test cases: 1652/10000 Fail cases: 0
Test cases: 2549/10000 Fail cases: 0
Test cases: 3449/10000 Fail cases: 0
!! REGISTER ERROR: PC 341 4949
[-] 4044) FAIL
Initial State:
Registers:
A: 158
P: 230
PC: 379
S: 125
X: 137
Y: 52
RAM:
341: 173
379: 32
380: 85
381: 19
Final (Expected) State:
Registers:
A: 158
P: 230
PC: 341
S: 123
X: 137
Y: 52
RAM:
341: 173
379: 32
380: 125
381: 1
Emulator:
Registers:
A: 158
P: 230
PC: 4949
S: 123
X: 137
Y: 52
RAM:
341: 173
379: 32
380: 125
381: 1
The program counter if overwriting itself.
Can you tell me how I can provide a stack trace? I have my written a standalone executable (https://github.com/oberoisecurity/ghidra-processor-module-verifier) that takes in a .sla and runs unit tests with the SLEIGH emulator. This was one of the tests that failed. Thanks again.
@pjsoberoi I don't think we need a stack trace. I think if we want solid emulation of this behavior we may have to utilize the trick from the Sparc about differentiating emulation vs decompiled output.
I'm in the process of verifying Ghidra's 6502 processor module vs processor unit tests (https://github.com/TomHarte/ProcessorTests/tree/main/6502/v1). For opcodes 0x00-0x40 I have three unit tests failing that I cannot fix easily. I need help deciding what the correct behavior is. Perhaps we need to run it on a real CPU?
1) JSR $1355 (opcode: 20 55 13) PC is 4949, should be 341 20.json 4046
{ "name": "20 55 13", "initial": { "pc": 379, "s": 125, "a": 158, "x": 137, "y": 52, "p": 230, "ram": [ [379, 32], [380, 85], [381, 19], [341, 173]]}, "final": { "pc": 341, "s": 123, "a": 158, "x": 137, "y": 52, "p": 230, "ram": [ [341, 173], [379, 32], [380, 125], [381, 1]]}, "cycles": [ [379, 32, "read"], [380, 85, "read"], [381, 19, "read"], [381, 1, "write"], [380, 125, "write"], [381, 1, "read"]] },
The issue here is the instruction is overwriting itself as it's executing. pc is 379 and we are writing bytes 381 and 380, and then reading byte 381. I don't know what the correct behavior is supposed to be.
2) ROL $7D (opcode: 26 7d ff) P is 228, should be 101 26.json 1251
{ "name": "26 7d ff", "initial": { "pc": 65535, "s": 137, "a": 219, "x": 194, "y": 144, "p": 100, "ram": [ [65535, 38], [0, 125], [1, 255], [125, 160]]}, "final": { "pc": 1, "s": 137, "a": 219, "x": 194, "y": 144, "p": 101, "ram": [ [0, 125], [1, 255], [125, 64], [65535, 38]]}, "cycles": [ [65535, 38, "read"], [0, 125, "read"], [125, 160, "read"], [125, 160, "write"], [125, 64, "write"]] },
The issue here is that PC is at the end of the address space when starting the instruction. Again I don't know what the correct behavior is supposed to be when accessing past 0xffff. Wrap to zero? Illegal instruction?
3) AND $45EF,X (opcode: 3d ef 45) A is 0, should be 144 P is 103, should be 229
{ "name": "3d ef 45", "initial": { "pc": 65535, "s": 99, "a": 184, "x": 115, "y": 124, "p": 101, "ram": [ [65535, 61], [0, 239], [1, 69], [17762, 235], [18018, 209], [2, 20]]}, "final": { "pc": 2, "s": 99, "a": 144, "x": 115, "y": 124, "p": 229, "ram": [ [0, 239], [1, 69], [2, 20], [17762, 235], [18018, 209], [65535, 61]]}, "cycles": [ [65535, 61, "read"], [0, 239, "read"], [1, 69, "read"], [17762, 235, "read"], [18018, 209, "read"]] },
Same issue as 2), PC is at the end of the address space when starting the instruction.