Shrink assembler/disassembler

patricksurry commented 3 months ago

Uses the assembler word names for disassembly rather than having a separate lookup table duplicating the strings. This removes about 1.7K of opcode tables and code when including the assembler/disassembler.

The main trick is to generate a table of 256 jsr asm_op_common instructions (each 3 bytes).
This replaces the list of 177 hand-crafted lda #op / bra|jmp asm_op_common words which are 4 or 5 bytes each.

For a given opcode $op, we pick the entrypoint in the table which will generate a return address of op/hh i.e. with op in the LSB. Then we can pop the return address to get the opcode for assembly. It also means the xt for each assembler word has LSB that's two less than the opcode.
Then for disassembly we can just search the assembler wordlist to match the opcode and get the the name of the matching word. Along with a small routine op_length to calculate operand length we can get rid of the lookup tables.

There was one missing opcode in the assembler which I added with a test.

The disassembler (sort of) supported the extended rmbN, smbN, bbrN, and bbsN opcodes although they weren't tested and a couple had typos. The assembler didn't support these. Currently I've left those out but they could be added if needed.

If we wanted to lighten more I think we'd need to avoid the one to one correspondence between opcodes and forth words. For example you could split the address mode as a separate qualifying word and write 1000 cmp .x instead of 1000 cmp.x. This would mean a lot fewer words and some assembler-specific parsing to recognize the address modes. (But would need some alternative for the normal 16 bit mode, ie. adc has no explicit suffix.) Or have a single 'assemble-opcode' word that uses a lighter weight lookup table, e.g. 1000 $ cmp.x where $ looks up the following string in a lookup table of string => opcode.

SamCoVT commented 3 months ago

This looks good. Scot and I had discussed using the dictionary to lookup the words, but hadn't thought of the trick of using the return address LSB as the opcode to shrink the opcode table. That greatly simplifies things.

I can see why you want to use a macro to generate the headers here. I normally shy away from macros as they can make things more difficult to port, but I don't see a good non-messy way to do it without macros here. I suppose that means you can probably talk me into macroizing the other headers with a more generic macro. Someone wanting to port to a different assembler would just need to rewrite the macros in their favorite assembler syntax.

The comments are pretty good at explaining what is going on, for anyone that wants to look under the hood and your behavior for unknown opcodes looks good. In regards to the bbxx and xmbx instructions, I think the goal was to keep it to just the common 65C02 set rather than to include any of the manufacturer-specific opcodes. We can add them if you think there is a use-case for them, but they normally have the bit number in the name - if we add them, I'll recommend we take that bit number as an argument and we'll just need special checking for these instructions (they all end in 0x07 or 0x0F).

I don't think I'm interested in separating the addressing mode from the opcode, even though that could offer more savings. What you have so far offers a good amount of savings while keeping the syntax exactly the same.

SamCoVT commented 3 months ago

Is this in a good state for merging? Don't worry about resolving any conflicts in the test results, as I'll be rerunning the test suite after merging anyway.

patricksurry commented 3 months ago

Yup, this is good to go.

I headed off on another tangent trying to write a minimal traditional (non-SAN) disassembler :-). But that's just something I might use in a minimal image without an assembler.

btw, I did get a flexible header branch working but it'll have a bunch of merge updates to do before I can post it here.

SamCoVT / TaliForth2

Shrink assembler/disassembler #127