anachronauts / jeff65

a compiler targeting the Commodore 64 with gold-syntax
GNU General Public License v3.0
6 stars 0 forks source link

Inline string literals / string constants #49

Open jdpage opened 6 years ago

jdpage commented 6 years ago

I want to be able to write

console.print("Hello, world!")

This is currently semantically problematic, and fixing that is simple enough, but I'm concerned that we're losing something if we do (specifically, an easy way of auditing program image size).

A little background explanation. In the spec, the constant statement's behaviour is defined as follows:

Binds a name to a value known at compile time which does not allocate memory in the program image. The value will be inlined at usage sites. Top-level constant bindings are exported from the unit as symbols, and may be referenced in other units.

The restriction to values which do not allocate memory means that arrays and strings cannot be declared as constant-bindings.

This wording stems from the fact that, when writing the spec, I struggled to come up with the difference between non-mut toplevel let-bindings and constant-bindings. In retrospect, I suspect that a simpler approach is better:

constant takes a "known" expression on the RHS, and binds the identifier given on the LHS so that it is also considered "known". (The rules for "known" expressions haven't been formally stated yet, which is probably a doc bug, but they go as you'd expect.) Crucially, "known" expressions are always inlined at usage sites. This implies that pointers cannot be taken to "known" expressions.

Toplevel non-mut let takes a "known" expression on the RHS, and binds the identifier given on the LHS so that it is not considered "known". This implies that it cannot be inlined at a usage site, and therefore must allocate memory in the program image to store the value. This, in turn, implies that pointers may be taken to such expressions.

Now, when writing an inline string literal (as above), the expected behaviour is that the string will be allocated in the program image, and it will be passed in some way (almost certainly as a slice) to the function, and that the actual numbers for that slice will be inlined.

This means that an inline string literal is a "known" expression. Therefore, it makes sense that we can define constants with string (and by extension, array) values, the result of which is that the string is allocated in the program image, and the slice addressing the string is bound to the LHS identifier as a "known" value.

This means that one can no longer rely on constant statements to not take up program image space, which makes auditing image size slightly harder, which bothers me a little. On the other hand, the fact that I plan to have the linker automatically eliminate unreferenced functions kind of makes it hard to tell where the space in your program image is coming from anyway, so maybe it's not that big of a deal. Maybe we could provide a tool which produces a breakdown of what's going into the program image, and where it comes from?

jdpage commented 6 years ago

Pinging @woodrowbarlow for input.