Add support for another cross-assembler

fadden commented 6 years ago

The question is: which one? Nearly all of my experience is in the Apple II world, so I don't have a sense for what's standard on other platforms.

The cc65 assembler can generate code for just about anything, but is light on target-specific constructs.

Merlin 32 is aimed at the Apple II, and has built-in support for things like high ASCII strings and OMF generation.

http://6502.org/tools/asm/ lists 23 assemblers and doesn't mention some of those that I'm familiar with (e.g. 64tass and Merlin 32), so there's a lot out there. The question is, which ones are actively used? Is there a "standard" cross-assembler for C64, Atari 2600, etc.?

qwertymodo commented 6 years ago

What would be really nice is if there could be an easy-to-use interface for adding new assemblers. Or (since I know that's a pretty tall order), at least some documentation and guidance on how to go about it within the code. Once banking support gets added, I could see this tool being incredibly useful in my SNES ROM hacking workflow, and it would be really nice to be able to cross-assemble to xkas or bass. Toward that end, I'd be more than happy to take a shot at adding support for them myself, but I have no idea where to even begin.

fadden commented 6 years ago

My original doc outline had a "how to add cross-assembler support" section, but I didn't get there. Partly because the interface evolved significantly when it went from Merlin-only to Merlin+cc65, and I'm guessing it'll evolve some more with a new set of assumptions. The short version is "clone GenCc65.cs and AsmCc65.cs and beat it until the regression tests pass", but there's some enums and combo boxes that need to be updated, and the "quick set" buttons in the App Settings dialog probably need to turn into combo boxes, and the code needs to get reshuffled. There's a bunch of stuff that needs to be made more generic / automatic within the code, and I figured I'd do that work as part of adding the third cross-assembler.

Somebody in another issue was attempting to do a simple alternative -- configure the pseudo-ops and other settings until it looks about right, then generate that (or just copy & paste it from the code list) -- but that's always going to require some post-generation editing. There's a bunch of bug workarounds in the current generators, and some things (like length-prefixed or "high ASCII" text) need to get spat out as bytes on many assemblers.

On a related note, if you have strong opinions about how to deal with the 65816 data bank register during disassembly, I'd love to hear them. The Flaming Bird disasm for the IIgs generates DBR pseudo-ops inline, which seems like a reasonable way to represent it. (Maybe this discussion should be a different entry in the issue tracker though.)

oziphantom commented 6 years ago

There are no standards. However TASS64 is pretty close. Its a the new version of the Turbo Macro Pro and has a TMP emulation mode. And it compiles on everything, I mean 68K Amiga 1000 everything. Its also a very powerful assembler ( has more features than CA65) and supports almost all variants of 65XX ( no Huc6820). It has psudeo op support, outputs natively to Commodore PRG, Atari XEX, Apple II DOS 3.3, Intel hex, Motorola S record for ROM burning, as well Flat bin, and sparse Bin formats. It can handle ASCII and PETSCII input formats with multiple byte {escape} codes for chars not on modern platforms. Full native support for 65816 and all of its mode. It can be Case Sensitive or Case Insensitive. And it has a peek hole optimiser you could run in the background to get a list of optimizations for the feature you want.

encore64 commented 6 years ago

64tass gets another vote from me. I've only used it for c64-development but it does have support for various 6502 compatible CPUs.

AgentFriday commented 6 years ago

My thought...

Translations between assembler syntaxes is a problem almost as big as the disassembly problem itself. It's something that would benefit many people, even when starting from source code, so I think it deserves a tool of its own in the workbench...

Trying to solve such a big problem within the context of another tool I think just makes it overcomplicated and underanalized. If there were a separate translation tool, how might it work, and how would it cleanly interface to the disassembler in a way that solved this in a big-picture way?

AgentFriday commented 6 years ago

To answer my own questions... I'd say that a clean intermediate representation is the key, and the obvious candidate is the internal storage format already used by the disassembler....

And logistically to make it happen with the least pain, I'd say put together a skeleton structure (and 2-way translation for 2 differenent syntaxes as a pilot), and let others write the syntax analysis and synthesis portions for the assemblers they care about. The test suite would be a central part of the structure, with the following primary paths:

Source A -> Intermediate -> Source A :: Should produce almost identical code to original, once whitespace ignored
Source A -> Intermediate -> Source B -> Assembler B :: Should produce identical object output to the path Source A -> Assembler A path.

oziphantom commented 6 years ago

Having the hex, the addresses of each "line" and a Dictionary<int,string> for address to label support would would be the "intermediate" format, one could then make a custom dissasm syntax generator to pull out each custom assembler. However more work. I would think an interface of GetByteDirective, Word, Long, LowByte, HighByte, BankByte, LoWord, HiWord, SetAddressTo which return .byte, .word, .long , <, > ' <> >' *=$Address would cover most assembly output. Probably something to define a label etc.

42Bastian commented 6 years ago

I write a lot of assembly which needs to support a varity of assemblers and found the "best" (TM) intermediate format is one which can be fed into a C pre-processor and/or a perl script. This way there is no need for bunch of Assembler syntax's to support.

mnaberez commented 6 years ago

http://6502.org/tools/asm/ lists 23 assemblers and doesn't mention some of those that I'm familiar with (e.g. 64tass and Merlin 32)

The page is on GitHub. Feel free to send a pull request or just email additions if that's easier. There are a lot of programs so we probably missed some but we do actively maintain that page.

fadden commented 6 years ago

64tass support is in the 1.1.0-dev1 release. Passes all regression tests, doesn't look too ugly. Things got awkward with leading underscores and case sensitivity, but it wasn't too bad.

AgentFriday commented 6 years ago

Andy, did my comments on this issue make sense? I'm coming from this not know what other tools you had envisioned being in this project, but I thought either I was making a lot of sense or I am just out of touch with what you had it mind... Would love to hear your thoughts if you have the time.

fadden commented 6 years ago

Converting assembler sources from one format to another is doable if you don't mind losing information. While the basic constructs like 16-bit words are easy to deal with, every assembler has a collection of unique features. Macros, conditional statements, substring manipulation, floating-point support, dictionaries, structs, expression operators, built-in functions, etc.

The trick is to extract the "portable" information, like labels and symbolic values, and make use of them in the new file. The main issue here, if you want to treat this as a disassembly problem, is that you need to associate the labels with the file offset where the code is generated. Some assemblers will produce a symbol table, but this is complicated by the variety of approaches to non-global labels (which usually won't end up in the symbol table). I'm not sure tying the translator to a disassembler makes things easier.

A fixed-purpose translator that converted (say) cc65 to 64tass would work better than a general-purpose translator, since various obscure options could be converted to something equivalent. Representing something in abstract form and converting it to assembler-specific syntax yields good results, but can get hairy.

For example, the largest chunk of code in the SourceGen code generators is the string converter, which does its level best to figure out how to represent a string optimally in the target assembler. For example, if you have a 200-byte string with a length byte prefix, you can either output it as a .byte followed by multiple lines of generic string directives, or as a single very long line with a length-prefixed-string directive, or perhaps as a length-prefixed string that spans multiple lines with a line-continuation indicator at the end of each. It also has to deal with embedded quote characters, which may or may not be compatible with a length-prefixed-string directive. And it's trying to make the code look pretty.

I think you can solve 80-90% of the problem with a straightforward solution along the lines of what you've been suggesting. The remaining edge cases are tricky or (for some uniquely convoluted macro constructs) seriously intractable.

In any event, I wasn't planning to write a translator. I think the closest I would get with SourceGen would be a feature that imported an assembler-generated symbol table and applied the symbols to the offsets. You'd still need some way to generate that table, convert it to a simple name/value syntax, and so on.

fadden / 6502bench

Add support for another cross-assembler #16