Closed baldengineer closed 5 months ago
The "attr" column for $c80e has a ">", which means $c80e (the 3rd byte of the LDX) is a branch target. If you select the line, the References window in the top left will tell you what is referencing that address. Because of the branch, the disassembler sees a path through the code that passes through the end of the opcode.
$c80e is interpreted as a two-byte LDY, which eats into the 3-byte LDY at $c80f, creating another mid-instruction opcode. This propagates for a bit, gets a break at the CLD, then starts up again when something branches into $c816.
The downward arrows are there to let you know that there are instructions with opcodes in the middle of them. Some code does this deliberately; there's an example near the bottom of https://6502bench.com/sgtutorial/odds-ends.html .
If you want to send me the project (e-mail or attach here) I may be able to tell you more.
Ah okay, I did not know what the > in attributes meant. (And I completely missed the references panel!)
The issue is there are jmps to Bank 2, which the analyzer is linking together.
I haven't figured out how to handle the 2nd bank yet, but at least I understand the overall behavior better now.
Thanks!
I figured it was something like that. I had a similar issue while fiddling with Metroid (https://6502disassembly.com/nes-metroid/)... 8 banks of ROM, 7 mapped to the same address, code in the first bank referencing entry points in 5 out of the other 7.
Some issues can be resolved by setting the address regions appropriately. Sometimes you have to set the operand symbols explicitly, because there are multiple identical addresses. SourceGen treats the address map as a tree and does a depth-first search, but that doesn't disambiguate all situations.
Another example: ProSel's CAT.DOCTOR (https://6502disassembly.com/a2-prosel8/) does a bunch of relocations, so that map got pretty interesting. FWIW, the ASCII-art address map is generated by View > Show Address Map.
Thanks! I'll take a look at how those projects are structured. I know I need to figure this out, but it's not my primary focus for doing this exercise. So, I'll come back and probably have more questions. :)
FWIW, here are the project files I have been working on. The "bank 1" project is my third start of this process. I keep learning the things I did wrong midway through. :)
A few notes that may be helpful...
You don't need to put the .sym65 files in the SourceGen RuntimeData directory. You can just put them next to your project file, and use "add symbol files from project" instead of "add symbol files from runtime". (They also shouldn't be copyright faddenSoft, since you wrote them.) This would remove the manual installation step from the download.
The relevant manual file section (hit F1 and find the "Platform Symbol Files (.sym65)" section, or open RuntimeData/Help/advanced.html
) has some additional details on the .sym65 format that may be helpful. For example, addresses and constants are specified differently, so the address resolver doesn't try to use the constants, and you can specify different symbols for read vs. write operations on memory-mapped I/O locations.
I couldn't really play with the project because I didn't find a ROM binary. The html output looks like you're making good progress.
The stretch at $c780 caught my eye because of the 24-bit math:
C780: 8D 28 C0 swrti ADR ROMBANK+$BF6865 ;RTI to the other bank
Looks like that got turned into data rather than code (those are alternating STA/JMP). A couple of them are referenced with JSRs, the others might need code start tags on the $8Ds.
Mapping the chunk at $c000-c0ff to a different address (or no address at all) might be necessary if you want the address resolver to find the project/platform symbols for the I/O addresses. SourceGen prioritizes in-file addresses over external addresses.
Thanks for the follow-up. Regarding the license, oops! I intended to clean those files up before sharing them (and then forgot.)
I'll re-review the manual on symbol files again. Your instructions make more sense now that I've created and used them a bit.
The issue around $C780 is because I flattened the code in that area to be inline data. It is all stuff that jumps to the second bank. So, by effectively ignoring it, the rest of the project is easier to read. (Side note, for the immediate goal I am trying to accomplish, I just need to see when code jumps to that block. It's just a jump table anyway, so I only care when I see things going to that address range. Those are the jumps I need to patch around.)
I'll keep working to understand how to do address mapping. Thanks again!
Ah...
C760: 4C 0E C8 ADR fixlc+$C7463E
That's what caused the problem in the initial report.
I've been thinking about the issue of references to overlapping banks. The problem at hand is that there is a reference to an address (such as $c80e) that exists in more than one place. The difficulty is that the disassembler's code analyzer wants to map that address to a file offset. There are three basic scenarios:
Ideally it would be possible to add something to the operand that told it which of the various addresses were the correct ones, so that the code analyzer could automatically visit all of them. The operand editor would need to have a list of checkboxes, one per potential target offset. In practice this is probably more confusing and more work than just adding a code start tag at those offsets, and would be difficult to maintain if the address map was updated. The one clear advantage it has is that the References list would be correct.
In theory we could use a symbol specified for the operand as a signal. If the operand is given a symbol that is defined in a different part of the address map, we could start the offset resolution process in that region instead of the instruction's region. This doesn't help with multiple targets though, and I'm not sure how this would affect existing behavior. (Also, we don't normally apply labels until after the code analyzer runs.)
A simpler approach would be to add a "do not follow" checkbox for absolute branch instructions (JMP/JSR). If set, the code analyzer simply doesn't follow the trail. For this project, the box could be checked on the various JMP instructions to eliminate the mid-instruction execution seen in the initial problem report. This isn't ideal, but it's fairly straightforward, and eliminates the annoying multi-path code issue.
I've added a TO DO list item for this.
This might actually work with the "isolated region" concept from issue #139. $c780-c7ff in each bank would be marked as isolated so that it didn't try to resolve symbols in the current bank.
I think there's still value in a "do not follow" checkbox for fixing up individual items, but considering multi-bank NES games like Metroid, I think when it comes to ROM banking there are segments that "reach out" and segments that expect to be reached into.
Here's a quick project using the new address space isolation features to put the entire ROM in a single project. I did a rough setup with the regions on the 32KB ROM file:
Use Navigate > View Address Map to see an overview of the region structure...
Address region map for "03-342-0445-A.bin"
+000000 +- start 'BANK0' [!in] [!out]
+000000 | +- start 'BANK0'
| | -NA- length=256 ($0100)
+0000ff | +- end
|
| $c100 - $c6ff length=1536 ($0600)
+000700 | +- start [!out]
| | $c700 - $c7ff length=256 ($0100)
+0007ff | +- end
|
| $c800 - $cfff length=2048 ($0800)
+001000 | +- start [!in]
| | $d000 - $f7ff length=10240 ($2800)
+0037ff | +- end
|
| $f800 - $ffff length=2048 ($0800)
+003fff +- end
+004000 +- start 'BANK1' [!in] [!out]
+004000 | +- start 'BANK1'
| | -NA- length=256 ($0100)
+0040ff | +- end
|
| $c100 - $c77f length=1664 ($0680)
+004780 | +- start 'bank_swp_table' [!out]
| | $c780 - $c7ff length=128 ($0080)
+0047ff | +- end
|
| $c800 - $dbff length=5120 ($1400)
+005c00 | +- start 'StartTest' [!in]
| | $2000 - $23ff length=1024 ($0400)
+005fff | +- end
|
+006000 | +- start [!in]
| | $e000 - $ffff length=8192 ($2000)
+007fff | +- end
|
+007fff +- end
The isolation feature prevents the 16KB banks from being aware of each other, and prevents the $c7xx code from trying to create symbols in the current bank. I put the page at $c000 in non-addressable space, since it's actually memory-mapped I/O. I also wrapped the Applesoft area just so I could slap a "junk bytes" on the entire thing.
Note that SourceGen commands like Goto (Ctrl+G), when given an address, will jump to the matching address closest to the selected line. So jumping to "c700" will go to either the first or second bank depending on where you start from.
I think this addresses the problems you were having.
I am disassembling the Apple IIc ROM Rev3 (345-0445-A), bank 1. Starting in $C800, I am getting a bunch of instructions decoded with a "▼." Which leads to a confusing listing.
At $C80C, the instruction should be
AE 66 C0
which would decode toldx $C066
. I created a label forMOUXL
. And $C80F should decode toldy $C067
. And so on.I'm confused why this is happening. I've removed analyzer tags, formatting, and even tried starting as "inline data" before setting the code start point.
I must be doing something wrong, but I cannot figure out what.