c2lang / c2compiler

the c2 programming language
c2lang.org
Apache License 2.0
704 stars 49 forks source link

Rename the elemsof operator to lengthof #92

Closed Dandigit closed 4 years ago

Dandigit commented 5 years ago

IMHO, elemsof is an unsuitable name for the operator which calculates the length of an array.

It doesn't make much sense grammatically or logically - this operator returns the length of an array, not the elems/elements of an array.

lengthof would be a far more suitable name for this operator. It's an operator which literally returns the length of an array.

@bvdberg, if you approve of this change, I'm happy to implement it.

lerno commented 5 years ago

Or maybe an even shorter operator?. It's also been discussed to actually have it postfix, so arr.length or arr.len.

Dandigit commented 5 years ago

That begs the question - if elemsof was become postfix, would sizeof also become postfix in the name of consistency?

Also, with a postfix operator that starts with ., e.g. .len, .size, it blurs the line in terms of what the . actually means. In all other contexts in C2, it separates a parent (e.g. module, struct, union) from a member. In this context, it would just be part of the operator's name.

There are seemingly infinite things to consider when designing such a small change.

lerno commented 5 years ago

Definitely. This is something that @bvdberg has brought up.

An advantage is that it frees a lot of keywords that otherwise might start ballooning as one looks at the many compile times that’s actually available in GNU/Clang.

I think Zig should be a warning here. It has a huge list of sizeof-like compile time functions. Most of those are really useful but at least I find them intimidating since they cannot be grouped in a suitable manner.

Using the . we can actually think of it as calling an automatically generated function.

That said, I’m not sure about the direction. I see both pros and cons with either way.

bvdberg commented 5 years ago

I think there was a forum post discussing this among others, but I can't seem to find it. The elemsof was chosen because it looks like sizeof. I agree that is doesn't look really nice, but it Is un-ambiguous. len or size are not.. What is the len of an array? The total byte size of the number of elements? We discussed that using a dot operator for these would perhaps be better:

pro: it removes the global symbols elemsof/sizeof etc. con: it prohibits the use of those names as struct members pro: Type.size() looks better than sizeof(Type) (my personal opinion, a matter of personal taste).

I cannot think of major issues by choosing this strategy (but that's why they are called un-foreseen issues :) )

lerno commented 5 years ago

Let me list a few possibilities:

  1. Use function-style layout: sizeof( ... ), elemsof( ... )
  2. Use function-style but with sigil to indicate compile time functionality: @sizeof( ... ), @elemsof( ... )
  3. Use suffix-style-as-struct-function: i32.size(), a.size()
  4. Use suffix-style-as-struct-function-using-sigil: i32->size(), a->size()
  5. Use struct member style: i32.size, a.size
  6. Use struct member "using sigil" style: i32->size i32:size i32$size

Advantages and disadvantages can be discussed for each.

Dandigit commented 5 years ago

Out of all of these approaches, I think number 1 (function style), and number 3/5 (struct function/member style) are the worst.

Out of the remaining approaches, number 2 and number 4 are the worst.

That leaves number 6. I feel like the language needs syntax to explicitly say "this is accessing a compile time attribute". i32:size is different syntax that does not:

It gives the language a way to explicitly express compile time attributes. When a programmer thinks "Hmm... I need \<compile time attribute> of \<something>", they'll know to use :.

Everything I've said is completely subjective, and I'm no expert, so take it with a grain of salt.

bvdberg commented 5 years ago
  1. is currently used by C and I don't see a 'massive stack' of reserved symbols in that. So it's certainly viable
  2. If we also use a special sigil for macros (like Rust), we could also use that for these. 3+5 are the both the dot approach. If we use a sigil, we dont need a function like call maybe. So Not .@size(), but .@size. If we keep sizeof, it could be .sizeof() and .elemsof() maybe. 4+6 are quite unreadable IMHO

The :: and -> approaches are not in the language currently and my preference is to keep them out.

One possible issue with the dot approach I thought of was the sizeof for base types, so i32.size(). This does currently not parse, since an expression cannot start with a base type.

Dandigit commented 5 years ago

Point taken. The difference between Rust macros and this case, however, is that macros cannot be called on an object/type with the . operator.

The .sizeof() and .elemsof() approach isn't very pleasant IMO. Look at how different approaches read in English:

sizeof(int) => size of int
int.size => int size
int.sizeof() => int size of

"int size of" isn't really desirable.

The parseability of the dot approach is definitely a valid issue. You certainly could dive into the code and allow T.size() as a special case, however with special cases comes inconsistency.

Whatever approach is taken, I strongly believe that there must be something to indicate that size and length are compile time properties, such as a sigil.

lerno commented 5 years ago

(1) There are a bunch of reserved keywords in C. It's just that they're prefixed with _ and then used through their macro: https://www.c-programming-simple-steps.com/c-keywords.html. If we look at Zig's built in functions, then these would be keywords if implemented in C: https://ziglang.org/documentation/master/#Builtin-Functions – I'm not saying that we need this zoo of built-ins, but just to show there is an argument for keywords increasing rather than decreasing.

By the way, if the .size-approach is used I prefer it to look like i32.size rather than i32.size(). The reason is that I prefer to associate () with runtime evaluation. ".size" signals to me that this is some way is constant during runtime and is safe to use without an unnecessary performance hit – which is exactly what happens.

lerno commented 5 years ago

@Dandigit the difficult thing is finding good sigils. Using @ quickly gets noisy, especially when used for macros as well. I don't like "!" at all since that signals exception handling for me and I want to reserve that in case it's needed. In general postfix sigils are harder to see as well. Do you have any suggestions?

(I obviously agree on i32.sizeof being bad)

Dandigit commented 5 years ago

@lerno I completely agree that @ is noisy and that ! is too established as "exception incoming". My personal preference for a sigil is an apostrophe: '.

i32.'size
'sizeof(i32)
a.'size

It's not perfect though, a simple ' can be quite easy to miss. I suppose it's something you'd learn to look for.

lerno commented 5 years ago

The apostrophe would probably be the last one I’d use due to its association with strings.

In general C is a difficult language for sigils since so many operators use characters already. Even operators have different meaning in an expression already: & and * being the most frequently occuring ones.

So finding a good sigil that people can agree on will be a though job.

Dandigit commented 5 years ago

I can't believe that the apostrophe's association with strings/chars crossed my mind - thanks for pointing it out. Definitely not preferable.

bvdberg commented 5 years ago

There are not so many possible ascii characters that are suited here. Looking at the ASCII table: ! $ % @.

None seem really nice. When I was thinking about this, I also came up with another solution. Since sizeof/elemsof both produce compile-time constants, we could force their usage with a capital case (That's also mandatory for other constants in C2). So again either .Sizeof() or .Sizeof, .Elemsof. Since no struct member can start with a capital char, no clashes are possible.

I think I either prefer Sizeof(x) or x.Sizeof. So x.Sizeof() is out IMO, since it's not a function.

Dandigit commented 5 years ago

That's not a bad idea - it definitely cleans things up a bit while still making it clear that Size is a constant. With this approach, we've got the following candidates:

T.Size
instance.Size

Sizeof(T)
Sizeof(instance)

array.Length
Lengthof(array)

array.Elems
Elemsof(array)

I'm still not sold on Elemsof/Elems - you say that it is unambiguous compared to Length/Lengthof but I digress.

x.Elems or Elemsof(x) read as x elements and elements of x respectively. The issue here is that this operator does not return the elements of an array, rather it returns the amount of elements. If this operator were to return the elements of an array, it would just return the array itself.

The amount of elements in an array is commonly expressed as its length, which is why x.Length and Lengthof(x) are less ambiguous than x.Elems and Elemsof(x).

If we wanted to eschew ambiguity completely, we'd end up with x.AmountOfElems and AmountOfElemsOf(x) which are both quite ridiculous.

lerno commented 5 years ago

I personally think that the uppercase there is a bit of an eyesore. That we even run into the issue is because pointer struct access and direct struct access is the same.