Closed StanHash closed 19 hours ago
This looks very impressive and well documented. I can merge this when you feel it's ready, though lemme give my 2 cents on a few things that you're (or I'm) unsure of:
Add ReadByteAt(offset), ReadShortAt(offset) and ReadWordAt(offset) built-in macros. They allow reading data from the working binary. It only sees what was there before assembly. Those macros are disabled in AA mode.
This makes me very uncomfortable. I think a strong point of the EA language is that it's agnostic to the file being edited. That said, AA
mode is the default (correct me if I'm wrong?) -- which I guess then begs the further question of what would this be used for? It's not inherently bad but I think there's a thing to be thought of for complexity bloat and this could lead to very unintuitive conditionals being written that would be better to be handled elsewhere in tooling or have some thought put into it to avoid.
'Maybe' has been removed in favor of extensions that work with nullable types (for example: IfJust works on any 'T?') for reference types, this works on any type (as nullable for reference types is just an annotation)
I don't really mind this if it's functionally equivalent and the nullable annotation on references is enforced by the type system. The only thing we really lose is the ability to next the constructor (e.g. Maybe<Maybe<Int>>
) but that's not really useful if we have bind
anyway.
The ?? operator seems like it would be redundant with IsSymbolDefined. However, they both have different semantics: a ?? expression is always evaluated at the very end of assembly (when data is inserted) while IsSymbolDefined is evaluated immediately, so it's not that simple. I wonder if there's something that can be done here to make this more elegant regardless.
My reading of your documentation makes me think of ??
as being useful for preprocessor stuff (or vice-versa). Since some preprocessor stuff care about immediate evaluation (for parameter passing) I can see having both be useful.
All Read...At macros could be defined in terms of any one other. I don't know if we want to keep all three or just one and let the user add their own.
Keep all of them as language defined. Same feel as BYTE SHORT WORD
being definable in terms of BYTE
, but having all 3 just makes it nicer.
the ternary operator A ? B : C is not implemented, but (A && B) || C would be functionally equivalent
This makes me deeply uncomfortable and I am honestly not very on board with && and || making sense for how preprocessors would handle things, but not with how a language would handle things, but I think your definitions are sensible so I won't object to this.
In parallel to this, I have been working on a very basic test program and set, in an attempt in making sure I (and possibly future contributors) don't break anything important. Idk if I should include this here or no.
Test cases/scripts should be included with the project, please do include them here if you have them.
Also I admittedly have not been hacking or maintaining this as much in recent years, so ... don't expect the most thorough of reviews or tests for functionality or any grand visions for design. Comments I give above are mostly just my opinions / what I've seen as commonly done in project/language design. If the people using this want certain features and they're implemented, I'll merge it, the most I'll do is bugfixing here and there.
AA
mode is the default (correct me if I'm wrong?)
AA
is the mode that outputs ASM+LDS that was contributed a while ago I don't think it sees much use beyond the one user. The default mode is just A
, so the Read...At
macros would be enabled basically always, I just thought I would note the technicality.
The ability to check existing values in the binary was brought up when I asked for suggestions (by Vesly). I was not a fan of it conceptually either but I do understand the potential uses (I believe Vesly is working on randomizer mods to be applied to existing mods? I which case this could be used to check whether some known patch was installed or no).
I was thinking making it require enabling the macros explicitly (probably through a command line flag) but couldn't reasonably convince myself it wouldn't just be annoying.
Last three things I want to (attempt to) implement before freezing this feature-wise:
#inctbl "encoding_name" "path"
directive that loads a tbl file (https://datacrystal.romhacking.net/wiki/Text_Table) and gives it a name and a STRING "string" "encoding_name"
statement that dumps the encoded string. We could also introduce some built-in encodings (UTF-8, Windows-1252 (Latin-1), CP932 (Shift-JIS) would be potentially useful for existing EA users).Then I will do some larger scale testing (I've been periodically testing on SkillSystem to make sure nothing breaks too horribly but I'm looking to test on other (perhaps more involved) public buildfiles) and if nothing unexpected breaks this will be ready.
I guess a friendlier summary of changes could also be on the TODO list.
This is ready feature-wise. Any further changes would be bugfixes.
Tested on Skill System and Pokemblem. Skill System builds with no changes, and Pokemblem (after addressing the diagnosed breakages) builds with some changes that relate to bugfixes which don't (seem to) break anything.
I also put up a build here: https://github.com/StanHash/ColorzCore/releases/tag/20240505
Sorry, I must've missed the email saying the changes were review-ready! This is good to merge, and I'll update release posts with the updated build.
I've been hacking new features into ColorzCore for the past few weeks or so. This is getting close to ready.
I release binaries for this branch whenever here: https://github.com/StanHash/ColorzCore/releases.
Features
Some of these were suggested by other users.
Offsets are now Addresses
ORG 0x123 ; Label: ; MESSAGE "{Label:X8}"
prints08000123
, not00000123
(yes there strings now get format specifiers, see below).Symbol := value
, see below).New operators
#define IsSymbolDefined(name) "(((name) || 1) ?? 0)"
Directives and Macros
New generic '#if' conditional directive (self-explanatory).
#undef
now accepts multiple parameters.Allow define to take any arbitrary sequence of tokens directly. Previously, such tokens needed to be enclosed within a string to work.
#define MyData(a, b, c) ALIGN 4 ;WORD a ; SHORT b ; BYTE c
(no quotes!)Make object-like macros (aka "definitions") defined to be expanded into exactly their own name (
#define MyName MyName
) not participate in macro expansion. This allows defining macros for use with#ifdef
and friends while still keeping the name available for symbols.#define ItemTable ItemTable
, one can doItemTable:
later without causing an infinite loop, yet also#ifdef ItemTable
is validated.Symbol assignment
Symbol assignment using ':=' (closes #48)
Symbols are a generalization of Labels, and have the same properties:
{}
local scopes (unlike macros).--nocash-sym
switch is given.Formatted Interpolated strings
{expr:spec}
which are expanded within.{MyOffset:X8}
)ReadByteAt
ReadByteAt(offset)
,ReadShortAt(offset)
andReadWordAt(offset)
built-in macros. They allow reading data from the working binary. It only sees what was there before assembly.STRING and custom .TBL encodings
STRING
statement, which emits the given string encoded into a given encoding (UTF-8 by default). This makes the String(...) builtin macro redundant.STRING "my string" "<encoding name>"
"utf8"
is also correct).#inctbl
directive.#inctbl "<new encoding name>" "path/to.tbl"
Misc.
__LINE__
and__FILE__
special identifiers (closes #61)ALIGN
now takes an optional second parameter that defines an alignment offset. Basically,ALIGN Align Offset
skips bytes untilCURRENTOFFSET % Align == Offset
is true.--nocash-sym
now also outputs labels from local scopes (closes #56).BASE64
statements. This renders #59 redundant.Diagnostics
--warnings:...
.--warnings:...
allows you to disable or enable some warnings.--warnings:no-redefine
will turn off warnings when you redefine a label.--warnings:no-nonportable-pathnames
) is emitted on non-portable paths (paths with incorrect capitalization) in include/incbin/inctbl directives on Windows (closes #16).--warnings:unguarded-expression-macros
), will be emitted when a macro is expanded that features operators that aren't guarded by parenthesis.--warnings:no-unintuitive-expression-macros
) will be emitted when a macro that looks like a self-contained expression will be expanded in a way that makes it merge with its surroundings in unintuitive ways.#define MyMacro(a, b) a + b
BYTE MyMacro(1, 2) * 2
will result inBYTE 1 + 2 * 2
which would not result in what one would intuitively expect. This is particularly useful as it helps find cases where the changes in edge-case macro behavior described earlier would cause a change. An example of such case can be seen explained here: https://github.com/FireEmblemUniverse/SkillSystem_FE8/pull/637.--warnings:legacy
) is emitted when using legacy constructs kept in for compatibility (this is only emitted when using the String macro for now).Bug fixes
MySymbol : ; WORD 0 1 2 3 ... { POIN MySymbol ; ... ; MySymbol : }
the POIN would refer to the firstMySymbol
rather than the second.Technical changes
#define
,#undef
, andif[n]def
) or arbitrary tokens (#define
, maybe should be expanded to tools directives) are defined in terms of the input token stream directly.Log
is nowLogger
Maybe
(or what's left of it) is nowNullableExtensions
EAInterpreter
is nowEADriver
EAParser
class was split intoEAParser
,EAInterpreter
,StringProcessor
andLoggerExtensions
. Also the ParseAtom method was moved to an extension class calledAtomParser
.CaseInsensitiveString
was removed (it was unused)#ifdef
and#ifndef
were merged.Uncertain design points
Testing script
In the
Tests
directory lies a single python script namedrun_tests.py
. I have been working on this very basic test program in an attempt of making sure I (and possibly future contributors) don't break anything important as new stuff gets added.Things that break
ORG 1; Label:
would result in the value of Label being0x08000001
and not 1. Of course, one wouldn't need one such hack becauseLabel := 1
is now legal.ASSERT UpperBound - CURRENTOFFSET
would break if UpperBound is an offset rather than an address. This is detected and will produce a warning rather than an error if that assertion would not fail if CURRENTOFFSET was an offset.setText
now only works when one gives it a label or a raw address.Warnings have been introduced to try and diagnose most of these.