gbdev / rgbds

Rednex Game Boy Development System - An assembly toolchain for the Nintendo Game Boy and Game Boy Color
https://rgbds.gbdev.io
MIT License
1.33k stars 175 forks source link

[Help wanted] Turn lexer back into flex definition #485

Closed ISSOtm closed 3 years ago

ISSOtm commented 4 years ago

The lexer is currently a 1 kloc (+ 800 kloc if you count globlex) monstrosity of a modified auto-generated file. This should really be changed back to a flex definition somehow. The modifications made to it might require a custom program skeleton (probably, I think, but eh), but that would be certainly much better than what we currently have.

ISSOtm commented 4 years ago

This is actually progressing—who would've guessed.

I have currently the following left to implement (at least, that I'm aware of):

We do have a problem, though. Some features need to be axed in order for this to be possible..!

Side note: if the above two OPTs are removed, this only leaves OPT p, which I believe is made redundant by ds cnt, val. So then OPT could probably be removed entirely.

JL2210 commented 4 years ago

How will this affect existing code? Can you give examples of what won't work afterwards?

ISSOtm commented 4 years ago

How this affects existing code is the question I always ask myself about each feature change. We don't have any usage statistics beyond pret's disassemblies, and this doesn't even include any hacks based on them.


The reason why I do not want to make "naked" interpolations work like macro args is that parsing such interpolations is very complex, especially as they can be nested. Thus, it's OK to handle inside a string, since we're already working with a blob of data, but outside of it, it's expected to span multiple tokens (like macro args).

Macro args are not handled in the lexer rules proper, since they happen beyond the concept of tokens; instead, the code responsible for filling the lexer's text buffer (YY_INPUT) is overridden to look for macro args, and "paste" their contents in. The complication is that flex puts a cap on the amount of characters that it expects from the function, so the entire contents can't be plainly dumped. To avoid unnecessary copies/allocations, and thus keep the lexer—one of the two most performance-critical pieces of RGBASM—efficient, additional logic is present to "spread" the pasting across buffer refills.

The problem that "naked" symbol interpolations introduce is that they are really complicated to process. The hardest part is that they can be nested, which opens the can of worms of where to write the expanded results to, and so on. (It worked in the old lexer because the target buffer was large)

JL2210 commented 4 years ago

The only issue I see is that some auto-generation tools (particularly sdcc) generate an .optsdcc -mgbz80.

ISSOtm commented 4 years ago

This is irrelevant, because SDCC uses a custom weird assembler as its back-end instead of RGBDS.

JL2210 commented 4 years ago

@ISSOtm And it also has an option to generate RGBDS assembly, which is where I found that example.

ISSOtm commented 4 years ago

But that wouldn't be valid RGBDS syntax; where is the code generating that?

JL2210 commented 4 years ago

My best guess would be in sdcc-4.0.0/sdas/asgb/gbpst.c, and then maybe sdcc-4.0.0/src/z80/mappings.i.