IsoFrieze / DiztinGUIsh

A Super NES ROM Disassembler
GNU General Public License v3.0
270 stars 26 forks source link

figure out how to deal with routines that are called with different DP and DB values #37

Open binary1230 opened 3 years ago

binary1230 commented 3 years ago

This is sort of the opposite problem of #34.

(forgive me if I get some of the SNES register guts wrong and addressing modes wrong, still relatively new to asm)

In #34, there are parts of game code which always use the same value in the DP register as an optimization to save some bytes in the ROM. When using a trace logger, the DP value never changes over multiple runs of the game and it's pretty safe to assume it'll never change.


The question here is: how should Diz handle the output for situations (like Absolute Indexed addressing) when it knows the DB register is not constant?

consider this example: source bytes: BD 01 00 That's LDA with Absolute Indexed, X addressing. which means (glossing over the M flag) --> LDA $0001, X

In one tracelog run, Diz generates this assembly code:

UNREACH_EF0001 = $EF0001

DMA_copy_BYTES_to_RAM: 
LDA.W UNREACH_EF0001,X               ;C3059D|BD0100  |EF0001
STA.L SNES_WMDATA                    ;C305A0|8F802100|002180;  

after importing trace data from another run, it now generates this:

DATA8_EC0001 = $EC0001

DMA_copy_BYTES_to_RAM: 
LDA.W DATA8_EC0001,X                 ;C3059D|BD0100  |EC0001;
STA.L SNES_WMDATA                    ;C305A0|8F802100|002180;  

Each time you run the game with a different capture, there will be different DB value, since this function (happens to be a DMA routine) is grabbing data from all over the ROM.

One more look at two other examples in the debugger and walking through step by step on the math:

remember, the original code at $C3059D says: LDA $0001,X

image

in first case, X=#$03AD, DB=#$E7 LDA $0001,X computes the final memory address like this: LDA [DB << 16] + X + #$0001 LDA #$E70000 + #$0001 + #$03AD LDA $E703AE ; final memory address

image

in the second case, X=#$CFF7, DB=#$C3 so the instruction means: LDA $0001,X computes the final memory address like this: LDA [DB << 16] + X + #$0001 LDA #$C30000 + #$0001 + #$CFF7 LDA $C3CFF8 ; final memory address


All versions, when parsed by Asar, generate the correct bytes in the final rom of BD 01 00

I am guessing this works because Asar is just chopping the top byte off the label and using the lower 16 bits, so it happens to work out. example: with the label Diz generates (0xEC0001) in this last run, Asar probably just ands with 0xFFFF to put in the correct result of $0001. So if Diz throws values of DATA8_EC0001 or UNREACH_EF0001, it doesn't matter, the important part is the lower 16bit "0x0001".

So, right now, Diz is taking the pieces it has (a last value for DB and #$0001) and generating a label for it. It works OK, but it's weird for humans because the label doesn't refer to anything useful. And each time we import new tracelog data, we are swapping around tons of new labels that flap around randomly based on what the last thing the game happened to access was.

(I'm a big proponent of the generated asm code being useful for humans to read so it's possible to better understand what's going on)

**So OK, my question on this issue is, in a situation like this,

  1. what do we WANT Diz to output, and
  2. do we collect enough information to infer that this is happening?**

I think my answer is this, but I'd like some feedback:

Part 1: When we're capturing with tracelogging, right now we can only store ONE value for each register of D and DB value here: https://github.com/binary1230/DiztinGUIsh/blob/master/Diz.Core/model/ROMByte.cs#L12

RomByteData.dataBank RomByteData.directPage

Let's consider just dataBank here,

I think we need to modify those fields (or add new ones somewhere) to store information about whether more than 1 dataBank or directPage has ever been seen here. Either we could store an array of every value (like DB) that's ever been seen when executing at this address, or, we could add a new flag to mark "we have seen more than 1 DB come through here".

if that flag is not set, then DB and DP can be interpreted as "this is the only DB or the only DP that ever come through here", solving #34.

for this issue, if it IS set, then...we can better tailor the output to be smarter. In our case above, I think we really do want to print $0001 instead of generating a label. or perhaps generate a label of OFFSET_0001, or, just leaving a comment, perhaps showing a typical example of what X and DB values might be when coming through here.


ignore this, here is reference stuff for me when I forget all this in the next 5 minutes.... : )

Absolute,X http://www.6502.org/tutorials/65c816opcodes.html#5.3 Example: If the DBR is $12, the X register is $000A, and the m flag is 0, then LDA $FFFE,X loads the low byte of the data from address $130008 $120000 + $FFFE + $000A = $130008

DBR: Data bank register, holds the default bank for memory transfers. (in BSNES, this is 'DB') D: Direct page register, used for direct page addressing modes. (in BSNES, this is 'D')