gbdev / rgbds

Rednex Game Boy Development System - An assembly toolchain for the Nintendo Game Boy and Game Boy Color
https://rgbds.gbdev.io
MIT License
1.33k stars 175 forks source link

Add constants to sym file as well #606

Closed ISSOtm closed 1 month ago

ISSOtm commented 3 years ago

What makes the SYM file useful is that it's not easy to know where a given label is located; however, not all constants are obvious, as in the following example.

SECTION "Gfx", ROMX
Gfx:
INCBIN "gfx.2bpp"

GFX_SIZE equ @ - Gfx ; What value do I have?
    PRINTT "{GFX_SIZE}\n" ; This can help, but it's easy to lose it in a busy `make` invocation

For compatibility with existing implementations, maybe output them in comments, at the end of the file, with a special comments preceding them?

Note that this ties to #483.

Rangi42 commented 3 years ago

If something like this gets added, I think constants would be more useful in definition order, unlike labels which get ordered by bank:address value.

Many constants will probably be small values like 1 or 2, and it's not beneficial to keep all of those together. On the other hand, constants may be defined in a sequence (ID lists, struct field offsets, jumptable indexes, etc) and those are easier to read all together. (Ideally they would all have a shared prefix to easily sort them together after the fact, but I've seen shared suffixes, or just nothing.)

This format would keep all names aligned:

00:4000 Gfx
;; 0320 GFX_SIZE
aaaaaa123456789 commented 3 years ago

Interestingly, we do have a good example of this kind of thing gone wrong. When the Pokémon leaks started, a file called CRYSTAL_BY_NUM.SYM was shared, which was a symfile for some version of the Japanese game sorted in ascending order of values. However, the official toolchain treated exported constants as symbols with a bank number of zero, and thus that file contains many constants in addition to actual labels.

Out of 9,383 symbols contained in that file:

In other words, 30% of the file is unsorted noise. I don't expect most projects to do much better, since most constants are just small values that will easily become unsorted noise if the symfile doesn't preserve their declaration order. Not to mention no tool could make any use out of these values, since the tools can't possibly know which one (if any) out of the many constants valued at 4 was used for a particular ld a, 4 instruction once it has been assembled.

If you want to export constants, I'd recommend using a separate file for this purpose, perhaps containing some additional metadata that would make it more useful (i.e., easier to search or to sort).

Rangi42 commented 2 years ago

Proposal: let rgbasm take a -n constants.txt argument, to which it will output every EQU, SET, or RSSET constant in the order they were last defined. (So FOO = 1, then BAR EQU 2, then FOO = 3 would output BAR before FOO.)

The format could be similar to rgblink's -n symbols.sym file: value in hex, space, name. Values are 32-bit so they could all be zero-padded to 8 digits, but most constants will probably be in the 8- or 16-bit range so maybe don't pad them at all. Or, pad them to an even number of digits. (If this were a file format spec then padding, like order, would be irrelevant, but I'm thinking about what might be most convenient to read.)

Example:

; File generated by rgbasm
8000 VRAM_Begin
a000 VRAM_End
00 SRAM_DISABLE
0a SRAM_ENABLE
00 DECOATTR_TYPE
01 DECOATTR_NAME
...
vulcandth commented 2 months ago

I agree that this would be very useful, for both easily looking up values or using in scripting tools. I agree it should be in the order they were defined and should probably output into its own file. I'd almost wonder if it be better to have name value than value name; but honestly, I could make use of either.

Rangi42 commented 1 month ago

Here's what we're thinking of so far:

rgbasm will have a -s flag that takes a features:filename.dump.asm argument. features can be any comma-separated combo of equ, var (for =), equs, charmap, and maybe more (macro?). The extension is suggested to be .dump.asm because it will use asm syntax, since that's easily generalizable to just about any kind of state we might want to dump. It should support multiple flags so you can do e.g. -s equ,var:numbers.dump.asm -s equs:strings.dump.asm.

Constants will be sorted in definition order, like my initial PR for this did, but will be separated by types, presumably in the order which the user specified.

Questions:

ISSOtm commented 1 month ago
  • [x] Uppercase keywords? (That's generally how we show them in docs and warning messages.)

Well, uppercase keywords kind of conflict with SCREAMING_CASE, whereas lowercase keywords kind of conflict with other cases. I'd err on the side of lowercase keywords by default, though.

  • [x] Zero-padded values? (I liked my initial approach of padding to length 2, 4, or 8, although here that loses the benefit of being column-aligned.)

If we're printing the values in hex, then I think they shouldn't be padded, because no size seems appropriate IMO.

  • [x] Lowercase hex values? (I presume yes, since that's what {interpolation} does by default too, and it simply works for all numeric values.)

Sure!

  • [ ] macro support? (May as well, right?)

Please!

  • [x] If a charmap doesn't map any characters, should we still write its NEWCHARMAP heading?

Yes, possibly with a comment explicitly pointing out that it's empty.

  • [x] Newlines separating the kinds of values, and/or comments preceding them? (Makes it easier to find the non-first ones.)

I'm not sure what you mean here.

  • [x] If the user only selected one feature, should we still write a heading? (I currently don't.)

For consistency, why not?

  • [x] If the feature has no values, should we still write a heading, and/or a ; no values comment?

Yes!

  • [x] Warning or error for duplicate features, e.g. -s equ,equs,equ:-?

Warning.

  • [x] Warning, error, or default value for no features, e.g. -s :file.dump.asm?

For future-proofing, error. If a use case comes up for this, it's easier to add some behaviour.

Rangi42 commented 1 month ago

Newlines separating the kinds of values, and/or comments preceding them? (Makes it easier to find the non-first ones.)

I'm not sure what you mean here.

I mean the blank line and the comment headings here.

; EQU constants
DEF x EQU $2a
DEF y EQU $64

; EQUS constants
DEF s EQUS "hi"

From your other feedback I expect we should keep both.

aaaaaa123456789 commented 1 month ago

Regarding the syntax: I can see the value of assembly syntax, as it is naturally extensible, but please, keep in mind that the file should be machine-parsable without having to reimplement half of RGBASM to do so. In other words, it should be a documented subset of assembly syntax (which of course could be extended for newer versions — parsers should be instructed to accept all lines and ignore unrecognised ones) that is minimal enough to support everything you want to output and nothing else.

As a simple rule of thumb, try parsing your output in straight-up SM83 assembly. If the thought of doing so makes you scream, it's too complex. (For comparison, it would be very easy to parse a symfile in SM83 assembly.)

Rangi42 commented 1 month ago

I agree that keeping the output easily machine-parsable is valuable, but I don't plan on attempting to write a spec for that output. We don't document the .map file format either, and this is a much simpler file.

aaaaaa123456789 commented 1 month ago

Specifications aren't as hard to write as you'd think, but anyway, by "documented" I mean actually listed somewhere that, e.g., constants are always DEF constant EQU $12345678, so that people can rely on that and nobody pulls the rug half a year down the line saying "oh, those parsers were relying on implementation details".