galaxy-team / jupiter

Galaxy's assembler.
2 stars 2 forks source link

Determine architecture #5

Open Mause opened 11 years ago

Mause commented 11 years ago

(just trying to get some conversation started here)

@r4d2 ?

Mause commented 11 years ago

/me whistles

Mause commented 10 years ago

Looks like its going to be based on some pseudocode for a two pass assembler... pretty much the only documentation I could find

milesrout commented 10 years ago

My thoughts on architecture

Macro Preprocessing

At this stage I'm not sure whether or not we should include preprocessing (in the macros and includes sense) in the assembler. I can think of several reasons why and why not:

We can always provide a --m4 option, after all, which would pre-process the input with M4. Alternatively, we could provide a --no-m4 option if we wanted to make preprocessing with M4 the default.

Feel free to replace all instances of 'M4' in the above text with a preprocessor of your choice if you really loathe M4.

Pipeline

Whether or not we include a preprocessor in the distribution of Jupiter, the pipeline will presumably look something like this:

<text.asm> -> [preprocessor] => [lexer] => [parser] => [code generator] -> <object.o>

We need to define the interface between these stages properly, including how this will be extensible in the future and now: opcodes may be added, but we also want to support multiple variants.

Object Code

We need to finalise an appropriate object code format. Currently we have asteroid, but it's extremely immature and doesn't support much. We need the object code format to support pretty much everything a modern object code/executable format supports: sections, symbols, debugging information, relocatable object code, shared libraries, core dumps, etc.

How extensible should we be?

Should we support only DCPU-16 and DCPU-16 derivatives? That's the 'core mission' of the project, should we go beyond it? Should we include enough abstraction to allow any sort of target? What about assembly language variants? Should we include support for optimisation phases, or should that be left to compilers and higher-level programs?

Extensibility is good, but too much extensibility isn't. Compare git and bzr. Cloning the GNU Emacs bzr repository took HOURS when I did it the other day. In comparison it took 10 minutes to clone the GNU Emacs git repository. Sure, bzr has about 3 abstraction layers so you could completely rework the underlying layers and everything else would still work, but as a result it's really slow. In comparison git doesn't really abstract anything from the user except through the porcelain commands - an intentionally leaky abstractions. Linus chose the core architecture from day one, and he doesn't need an abstraction layer around it because it doesn't need to change in the future. The index, the object store, SHA-1 hashes, etc.: if you changed them it wouldn't be git anyway, so why abstract them away?

At the same time we want some level of extensibility, because we will want to extend it.

Mause commented 10 years ago

At the moment, if you want to an a new opcode (SET, ADD), you have to add its implementation to opcodes.hpp and opcodes.cpp. You then need to add handling code to three functions in assembler.cpp. Not really what i would call extendable.

Mause commented 10 years ago

Not to mention adding the handlers to the parser, some of which should probably be refactored into a macro

milesrout commented 10 years ago

inb4 macros are evil. IMO dynamic_cast<> is more evil than macros.

Ideally if you wanted to add a new opcode, you would only have to add it to three things:

  1. The lexer. Clearly you have to add something to the lexer, though it should be as simple as adding a new possibility for a particular token (unary_opcode or binary_opcode) in one place, with no duplication.
  2. The code generator. Obviously you need to define the opcode's... opcode. You know, ooooo.
  3. Some sort of syntax tree validator that checks that everything in the syntax tree is valid. Instead of having to write this sort of thing into the parser (which would overcomplicate things, I think), we can add this in an extensible way as one of a number (that number being initially 1) of AST traversal phases. Others could be optimisation phases: constant folders, etc.

Ideally, you would be able to write "ADD=1" somewhere and have this all take care of itself.