anachronauts / jeff65

a compiler targeting the Commodore 64 with gold-syntax
GNU General Public License v3.0
6 stars 0 forks source link

Pointer/array syntaxes #6

Open jdpage opened 6 years ago

jdpage commented 6 years ago

Jonathoughts (tm):

Hardware considerations

Note that the Indexed addressing modes are indexed by a u8. Indexing with something bigger is probably a standard library concern.

Syntax ideas

Not in love with all of these. In particular, I don't know if I like @ better than *, though I do like it not being the same character as used for multiplication. I'm exercising it below to see if it grows on me at all. & is probably fine for address-of; using it in the type instead of @/* provides symmetry with [] and means that it can always be read as "address of" and @/* can be read as "target of", i.e. &u8 reads as "address of u8" and &foo reads as "address of foo". I stole this from Rust.

We should probably distinguish between mutable pointers and read-only pointers. So C's const int* becomes &i16 and C's int* becomes &mut i16. I have no desire to steal Rust's borrow checker because that is an undertaking and even Rust is having trouble getting it right. I'm fine with providing just basic support for making it clear what a function mutates and what it doesn't.

I actually really like the idea of providing compiler intrinsics using function syntax, which can later be promoted to "real" syntax if it proves useful. Rust uses name!(args) syntax for this (and macros), but I'm not really in love with that.

use mem    

let foo: u8 = 7
let bar: &u8 = &foo     --[[ places the address of foo into bar ]]
let baz: u8 = @bar      --[[ dereferences bar, placing the result into baz ]]

--[[ mem.as-address is a compiler intrinsic that allows
--   a pointer to be interpreted as an address. 
--   mem.as-pointer is also a compiler intrinsic which is
--   goofily generic on its output type. I don't like it. ]]
let qux: u16 = mem.as-address(bar)
let quux: &u8 = mem.as-pointer(quz)

--[[ uninitialized arrays seem like a bad thing to allow by default
--   maybe provide another goofy mem intrinsic though? ]]
--[[ language arrays are limited to 256 bytes in length; bigarrays sounds
--   like a nice library feature. ]]
let mut spam: [u8 of 0 to 10] = [ 0 ]      --[[ declares a 10-element zeroed array ]]
let mut eggs: [u8 of 32 to 57] = [ 7 ]     --[[ declares a 25-element array filled with sevens ]]

let wibble: u8 = spam[foo]         --[[ note that array indices are u8 ]]

--[[ another goofy-generic mem intrinsic which evaluates to a compile-time constant ]]
let wobble: &u8 = mem.well-known(0x0400)

Note that pointer types don't have arithmetic defined on them--the mem.as-address and mem.as-pointer functions have to be used to convert them. Pointer arithmetic on the C64 is generally pretty slow compared to direct indexing, so I'm inclined to steer people away from it.

jdpage commented 6 years ago

I'm tempted to suggest handling the weird intrinsics using an unnamed type which contains a 16-bit value and can be assigned to and from any pointer type. This would be useful for a variety of useful intrinsics (for example, a mem.cast-pointer for turning pointers into other pointers.

woodrowbarlow commented 6 years ago

summarizing a conversation we had earlier regarding * versus @ for the dereference operator:

you suggested that maybe it was a mistake to name the nodes according to their semantic meaning rather than their text representation. i.e., you suggested that maybe instead of OperatorMultiplyNode, we should have called it AsteriskNode. the reasoning being that if we use * as the dereference operator, the node really shouldn't be called multiply since dereferencing and multiplying are very different things.

i see that instead as a yellow flag that maybe we shouldn't use * for dereference. i actually really like @. you also pointed out that it could be read as "the data at " which i think is also cool.

we're so light on punctuation that it's not like it would be shooting ourselves in the foot by using a distinct operator for dereference, and i think not overloading operators is a good way to make the language more easily learnable.

as for arrays... i don't understand the advantage of:

let eggs: [mut u8 of 32 to 57] = [ 7 ]

over:

let eggs: mut u8[25] = [ 7 ]

what do the starting and end indexes mean?

woodrowbarlow commented 6 years ago

i do see value in wrapping the brackets around the type, to make it clear whether your dealing with an array of pointers or a pointer to an array of integers... but i don't see the value of the of x to y thing, so please elaborate on what that is supposed to provide that a simple size wouldn't.

i'm tempted to suggest:

let eggs: [mut u8: 25] = [ 7 ]

except i don't like overloading the colon.

side note: this thing you're suggesting where you can do = [ 7 ] to set all the items in the array to 7 is a thing that C only allows for 0. i'm not saying we should only allow it for 0, because i think that's a bullshit limitation from the user point of view, but we should figure out why they chose to do that. they might have a non-obvious good reason.

jdpage commented 6 years ago

Okay, so let's assume we're gonna go with @ for dereferencing pointers.

As for the range thing (ignoring mut positioning for now), Rust does let eggs: [u8; 25] for array types.

Basically, the point of something like

let eggs: [u8; 32 to 57] = ...

... would be to allow you to declare an array with a custom range (useful for some algorithms?) where the type is laid out optimally for that. E.g. if you were to declare

let spam: [u8; 0 to 3] = [0]    --[[ three bytes ]]
let eggs: [u8; 1 to 4] = [2]    --[[ also three bytes ]]

Then the assembly emitted would be like:

spam:
    .byte 0, 0             ; first two bytes of spam
eggs:
    .byte 0                ; last byte of spam
    .byte 2, 2, 2          ; bytes of eggs

... thereby allowing eggs to be accessed without doing any pointer arithmetic. Basically, it's a memory layout optimization for working with arrays that you don't want to begin at 0 for some reason.

Possibly related, I was entertaining the idea of proposing an Ada/Pascal-style ability to create types like

type foo = 1 to 25

... in which case foo would be restricted to that range.

woodrowbarlow commented 6 years ago

okay, i'm on board with specifying non-zero-indexed array ranges.

notes:

  1. i don't love the semicolon. i don't love semicolons in general, but i definitely don't love using it to do something other than end a statement, for similar reasons to not loving unmatched single-quotes in a language.

  2. i also don't love [u8 of 0 to 5], although i find it more palatable than semicolons. i would be on board with [0 to 5 of u8] because it's five pieces of u8 data. or maybe [u8, 0 to 5].

  3. it might be convenient to have a shorthand if you want your array to be zero-indexed. like [5 of u8] or [u8, 5].

  4. what do multi-dimensional arrays look like?

and, in reply to your type proposal: it sounds like that making that a language construct (as opposed to a standard library function) could improve the efficiency of generated code. if that's the case, i'm on board with looking into it after we have an MVP.

woodrowbarlow commented 6 years ago

semicolons.

here's a multidimensional array:

[u8; 0 to 5, 0 to 10]
woodrowbarlow commented 6 years ago

arrays implemented in PR #20