Kroc / v80

A minimal Z80 assembler, running on Z80. Useful for bootstrapping bigger projects.
MIT License
25 stars 3 forks source link

C version #4

Open Kroc opened 4 months ago

Kroc commented 4 months ago

v80 is intended to allow development of software for 8-bit systems (both modern and historic) using the same source code and toolchain on both PC and the retro system itself. A system that can't fix and deploy its own software isn't a computer, it's an appliance.

At the moment, v80 is written in WLA-DX, a very good C89 assembler for many processors. Once complete, the goal is for v80 to assemble itself. Using emulators as part of a build process is unfortunately rather clumsy and fraught with problems getting the automation to work.

I wish for there to be a C version of v80 that can assemble v80 source files on PC the same as the Z80 version can on retro systems. That way, developers can have the best of both worlds by keeping their GitHub-driven development but not exclude the ability to develop and build on the retro system itself.

These are the requirements for a C version of v80:

Kroc commented 4 days ago

I had some time to think about the handling of multiple ISAs and whilst, naturally, the 8-bit native versions will utilise a separate executable for each ISA for memory reasons, I believe this should also be the case for C (let's call the C-port "c80" for shorthand) for a handful of reasons. Whilst it wouldn't be difficult for the C version to support multiple ISAs in the executable, each ISA does have unique parsing and opcode emitting rules (e.g. Z80 has shadow registers, 6502 doesn't) and because of this I don't want the ISA tables to be external; they should be compiled into the executable. I say this because I think the solution to v80/c80 building its own ISA tables can be resolved without requiring an intermediate format or a custom definition syntax (as you've currently done in your C code) by compiling/assembling a version of v80/c80 without instruction support and using that to assemble the ISA table. This can be done both in C and Z80. The definition syntax (.m) only duplicates ISA definitions and I'd rather avoid c80-specific syntax / increasing workload for anybody wanting to add new ISAs.

That said, the format of the ISA tables (v2-ISA has been merged with main) will be slightly altered to be consistent and to be embeddable as a pre-assembled binary. The 26 word "a"-"z" jump table will be moved to the beginning of the ISA table and will be changed to offsets into the table rather than absolute addresses. This'll mean that the table can be places anywhere in RAM/ROM on an 8-bit system without having to be assembled with v80. v80 doesn't support binary includes yet, but that's something that may yet be added.

Now that I've pushed the v2-ISA parser to main and I'll be modifying the v2-ISA format as above, be aware that you'll need to pull from GitHub before you commit changes! I'd like if you could rename "bootstrap" to just "c" as this is clearer and the v0 WLA-DX code will remain as a tried-and-tested "source of truth" for ISA opcode verification.

gvvaughan commented 4 days ago

I think you're asking for the C version to use the isa_z80.v80 table in memory to lookup what opcodes to write when encountering an assembly instruction?

But, the C version already correctly assembles z80 and 6502 ISAs without the need to artificially split it into two binaries. And the tbl_*.v80 ISA tables are much simpler to write and to parse than the assembly lookup tables from v2. This makes bootstrapping a new ISA from the C version a simple matter of creating a new e.g. tbl_6809.v80, and then using that to assemble a 6809 native assembly assembler. No need to simultaneously write a new isa_6809.v80, and a new parser for it in assembly and the assembler itself...

If you strongly prefer a separate binary for each C bootstrap assembler, the easiest way to do that is to convert the tbl files into a C string array, compile those into each binary and always load only the table in each binary's memory instead of picking an ISA at run time by loading the appropriate mnemonics from a file. I think writing a C parser for the assembled in-memory isa_*.v80 is chasing a moving target and much more work every time you want to add support for a new ISA.

Of course, I'll stop working on the tbl->isa table generation if you don't want to use that approach?

And, by all means, please go ahead and rename the directory or make any changes you would like to the C code. Or I can make a PR for that after my last one is merged if you'd prefer :-)

Kroc commented 4 days ago

Ultimately the goal of v80 is to enable native development on 8-bit machines without dependence on PC infrastructure. That includes developing and extending v80 itself. If v80 requires C to process tbl_* files then ultimately we've got an 8-bit CP/M program that requires 10+GB of OS and compiler tooling :P The C version is secondary to the 8-bit versions, not the other way -- yes, things can be done better and more intelligently in C, but that isn't part of the goal. The C version should bend to suit the 8-bit version, not the other way around, sorry :( That is not to say that my understanding isn't incorrect or incomplete though, it takes me time to absorb code and I need to spend more time with the C code to process what's the right approach.

gvvaughan commented 4 days ago

I'm not convinced that hand-rolling thousands of complex assembly lines and manually updating an ISA specific parser in C is a good use of time and effort. Doubly so as you improve the v80 table layouts and parsers that will all need to be tracked in the C version for it to keep working. That's not to say that you can't still hand-roll a massive lookup table for each assembly assembler if you want to... but either way, whether it's generated from a simple input format, or lovingly created by hand, it's a checked in file that doesn't have to be rewritten every time you build the assembler and makes no difference to use of the assemblers on 8-bit platforms.

Let's say you want to tweak the assembler lookup table layout in memory in future... you can either adjust the C code that creates it, regenerate and commit the result, or you can manually update the actual isa lookup table and commit that. The result is the same. I certainly agree that I shouldn't write a table generator that won't be used. I'm offering to write it so that bootstrapping a new ISA would only require creating a simple tbl_*.v80 file, and then running the C assembler on new ISA assembler sources as they are built out until it's self hosting -- that saves having to create all the moving parts in parallel.

I also agree that ultimately each ISA will need its own tbl-to-z80/tbl-to-6502/etc program since each ISA has a different lookup table format. I do, however, strongly disagree that the C bootstrap assembler should be tied to the individual memory layouts of each ISA lookup table. That will be significantly more difficult for both of us to maintain. I have a mild preference for one simple ISA definition file as a source of truth, but since they are very quick and easy to write, if you much prefer to hand maintain individual v80 isa_*.v80 instruction lookup tables and parsers for current and future architectures, then I'm not at all unhappy with maintaining the tbl files in the c/ tree that let the C assembler work on any tbl-defined instruction set without recompiling. Either way, I'm certain that you'll find adding a tbl_*.v80 file for a new ISA to the c/ directory as you bootstrap into self hosted assembly language assemblers will be much easier for you than writing the new isa_*.v80, while simultaneously designing the memory layout and bit flags, and a parser as well as creating the assembler itself.

Kroc commented 4 days ago

We are probably talking somewhat at cross-purposes; neither quite fully understanding the other. I'll try respond to this as I understand it;

Let's say you want to tweak the assembler lookup table layout in memory in future... you can either adjust the C code that creates it, regenerate and commit the result, or you can manually update the actual isa lookup table and commit that. The result is the same

Regeneration is not an option (for 8-bit native). It can't be done on an 8-bit system (unless compiling the C code on CP/M?). The goal is no reliance on a PC if the developer so wants, including developing new ISAs for v80. Otherwise what's the point in owning real hardware? :) The C version is not without use; it is better to automate an array of host CPU + ISA combinations, i.e. producing binaries for all versions of v80 in seconds rather than a minute each, manually, on real hardware. But if I want to write a from-scratch OS on a real Z80/6502, a PC shouldn't be involved once I have the first binary and the source code to rebuild it.

I'm not convinced that hand-rolling thousands of complex assembly lines and manually updating an ISA specific parser in C is a good use of time and effort

No, but making yet another Z80 assembler isn't either. There are goals other than complete efficiency. You know what they say about early optimisation. Nobody is even using the thing yet. It will evolve into something better in due time and it's better to let experience in practical use guide improvements than chasing code purity. The parser is 99% the same code, with a tiny bit of logic for ISA-specific quirks. For Z80 this is <128 bytes of Z80 code; for 6502 this is 32 bytes. ISAs will not be added quickly. Far more work is involved in tailoring the v80 source code to specific 8-bit HW / OSes like CP/M. Sometimes a little bit of work now and again saves a lot of work abstracting away the problem.

Doubly so as you improve the v80 table layouts and parsers that will all need to be tracked in the C version for it to keep working.

I've settled on the ISA table layout. This will not be changing in any meaningful way. The way v1-ISA worked was the only way I could get it to work early on when I had an incomplete assembler, it was ugly and very difficult to write and follow but it was necessary to move on to more important things. v2 is how I always wanted it to work.

I certainly agree that I shouldn't write a table generator that won't be used. I'm offering to write it so that bootstrapping a new ISA would only require creating a simple tbl_*.v80 file, and then running the C assembler on new ISA assembler sources as they are built out until it's self hosting -- that saves having to create all the moving parts in parallel.

I understand a PC-first attitude is normal, but none of that is possible from an 8-bit machine. If I can't modify and assemble the assembler on an 8-bit machine then we're not even Turing-complete. Could we also generate these files on the machine? Probably. Maybe that'll come down the line when there is more demand to do so, but I fear you're overestimating the need to pump out ISAs :P

I also agree that ultimately each ISA will need its own tbl-to-z80/tbl-to-6502/etc program since each ISA has a different lookup table format. I do, however, strongly disagree that the C bootstrap assembler should be tied to the individual memory layouts of each ISA lookup table. That will be significantly more difficult for both of us to maintain. I have a mild preference for one simple ISA definition file as a source of truth, but since they are very quick and easy to write, if you much prefer to hand maintain individual v80 isa_*.v80 instruction lookup tables and parsers for current and future architectures, then I'm not at all unhappy with maintaining the tbl files in the c/ tree that let the C assembler work on any tbl-defined instruction set without recompiling. Either way, I'm certain that you'll find adding a tbl_*.v80 file for a new ISA to the c/ directory as you bootstrap into self hosted assembly language assemblers will be much easier for you than writing the new isa_*.v80, while simultaneously designing the memory layout and bit flags, and a parser as well as creating the assembler itself.

You are not wrong and this is a hard path to navigate. I can see the benefit of having a way to define an ISA that isn't itself code but I will need much time to ruminate on how to reconcile the differences in approach with the extremely strict limitations and minimalist goals of the 8-bit code. New ISAs will not be appearing quickly, there is much work to do in platform enablement; I do not work quickly. I am slow, methodical and often wait for a perfect solution to pop in to my head once I've digested all the information necessary and left it to stew for a while.

There are two paths I would like to follow with the 8-bit code. One will be expanding the Z80 ISA to eZ80 so as to support the Agon Light and then, eventually, a native 24-bit port of v80 in eZ80. The second is to write a version of v80 that runs on 6502 platforms. I'm expectant that porting the entire code base will enlighten me to any abstraction needed and I'd rather wait until then before jumping-the-gun on how ISAs should be portable.

gvvaughan commented 4 days ago

We are probably talking somewhat at cross-purposes; neither quite fully understanding the other

You could very well be right. Rather than continuing along that road, I think I have a more concise argument for you:

All of that aside, my ulterior motive in writing it this way is because I have been fiddling with emulators for fantasy computers with self designed ISAs, and getting myself past the "I need an assembler to write programs to evaluate whether the ISA design is good" stage has been a (years long) pain for me. Especially as half way through writing the self-hosted assembler, I can't resist improving the ISA... throw away all the hand assembled code so far, and having to start over. The existing C version is already almost perfect to get me past that stage (I just need to add support for prefix arguments), because tweaking the ISA is a simple matter of tweaking the tbl file rather than having to start hand assembling hex codes from scratch again whenever I change my mind. Of course, I have no plans to burden v80.c with code that doesn't help your project, but I do expect to fork it, remove polyfills to simplify for C11 and use the result for this upcoming Masto #DecemberAdventure :-D

Kroc commented 2 days ago
  • The only reason to have the C version at all is to enable bootstrapping an 8-bit assembler binary from your sources without the need for a CP/M emulator or WLA-DX.

I would also add to that rapid prototyping on modern systems / IDEs. There are a couple of integration issues c80 can solve too; for the different builds of v80 across different platforms there will be a mix of shared source code (the v80 core) and platform-specific code, on top of which I would like to produce a set of optional libraries for interfacing with specific platforms (e.g. Apple II, C64, MSX etc.) -- I can't just put all of these files in the one directory! Whilst v80 can't support sub-directories due to 8-bit system limitations, c80 can include additional folders from the command parameters to produce on-the-fly builds without a mass of duplicating source files for each build. For native 8-bit builds, the necessary files will be copied for each platform release since the user is likely only using one platform.