c3lang / c3c

Compiler for the C3 language
https://c3-lang.org
GNU Lesser General Public License v3.0
2.89k stars 177 forks source link

enum with values / distinct const #1129

Open lerno opened 8 months ago

lerno commented 8 months ago

Enums changed with #428 from classic C enums to ordinal based enums.

This left a hole in the language, not the least in how to define C enums that aren't strict ordinals.

Early ideas included an attribute – which is bad because an attribute should be affecting the entire implementation of the enum. As well as just work around it with distinct type + sub module.

Later distinct was proposed for this, but worked poorly as it was not a keyword at the time. However, currently something like

module baz;
distinct const Foo : int
{
  ABC = 3,
  BCE = 123
}

Could be considered.

Questions remain in regards to semantic and usage. For example, is the usage: baz::ABC or Foo.ABC or Foo::ABC. The first case considers the code mere shorthand for:

module baz;
distinct Foo = inline int;
Foo ABC = 3;
Foo BCE = 123;

The second is close to enum style, but the question is then whether this is desirable giving the difference in semantics. The final variant Foo::ABC would give it a unique look, clearly indicating it is from a "const set", but then unlike Foo.ABC with the obvious inference of .ABC it would not seem like it should implement inference.

distinct const while not requiring a new set of keywords is fairly long, and the question is whether this is good.

Other syntactic alternatives would be:

constset Foo1 : int
{ 
  ABC = 3,
  BCE = 123
}
const enum Foo2 : int
{  
  ABC = 3,
  BCE = 123
}
enumconst Foo3 : int
{  
  ABC = 3,
  BCE = 123
}

All of those suppose a separate kind of types. The initial option with baz::ABC retains the option of just modelling this as constants.

lerno commented 8 months ago

Because you either have to have to different types under the same name, or you drop the functionality of the ordinal enums, such as runtime reflection. What language allows both and has name reflection like C3's for enums? How would that even work with enums-as-masks? It can't.

data-man commented 8 months ago

Masks are useful in C, because C hasn't introspection. C3 has enumset in std. It's enough.

lerno commented 8 months ago

Odin: keeps an array of name + value, taking the name at runtime means walking through the values and trying to match them. This doesn't work for masks or overlapping values. If Odin had associated values, it would similarly be a loop for every associated value to retrieve, that is O(n) rather than O(1) time. Which would make associated values expensive enough to never get used.

Zig: creates a function with a switch, matching value to name. Again, doesn't work with masks. Overlapping values are forbidden for Zig enums. If they had associated values, those would be O(n).

lerno commented 8 months ago

Masks are useful in C, because C hasn't introspection. C3 has enumset in std. It's enough.

Interop with C is the only reason why non-ordinal enums in this issue is even brought up.

OdnetninI commented 8 months ago

After all this discussion, I can agree that non-ordinal enums are not stricly necessary. However, I think that, from a code reading/writting perspective they are better than the repetitive sequence of const int CONST = n;. But now, I am more inclined to not have them into the language and work with consts. I need to see how this would affect a real medium-size application...

lerno commented 8 months ago

The code reading is what I'm concerned about myself, and I was kind of happy to have found a way to leverage associated values in a good way. However, in the end that feature increasingly seems like a bad idea and I will remove it.

lerno commented 2 months ago

Ok, so another try, what if we do this:

enum Foo : inline int(int val)
{
   ABC = 1,
   BDF = 2,
}
// For this enum, it will implicitly convert to
// the ordinal:
int a = Foo.ABC; // a = 0

We can also inline a parameter:

enum Bar : int(inline int val)
{
   ABC = 1,
   BDF = 2,
}
// For this enum, it will implicitly convert to
// the ordinal:
int a = Bar.ABC; // a = 1

To avoid confusion, this will change the meaning of (int)Bar.ABC. With inline, this will be as if (int)Bar.ABC.val. If we have an enum without inline, it can't be cast:

enum Baz { FOO, BAR };
int a = (int)Baz.FOO; // ERROR

Instead it's required to use .ordinal (possibly shortened to .ord):

enum Baz { FOO, BAR };
int a = Baz.FOO.ordinal; // This is the way
lerno commented 2 months ago

One further comment here is that this would strengthen the enum type to be stronger than a distinct and more in line with a struct.

lerno commented 2 months ago

So how do we convert from an int to an enum? Again, we can't use the cast, that would lead to unwanted results, consider:

int a = Bar.ABC; // a = 1
Bar b = (Bar)a; // b = Bar.BDF !

So we need to introduce a conversion function:

Bar b = Bar.of_ordinal(1); // Bar.BDF

This one can then also be typed with the index type (e.g if the ordinal type is ichar, then of_ordinal would take an ichar).

OdnetninI commented 2 months ago

So, depending where you put the inline keyword, you change what value is get from the enum implicitly. I like that approach.

But about your last comment, casting could lead to unwanted results if you inline val instead of the enum ordinal.

What happens with repeated values? Are they valid?

enum Bar : int(inline int val)
{
   ABC = 1,
   BDF = 2,
   EHG = 2,
}

bool equal = Bar.BDF == Bar.EHG; // I would guess true....

Also, not only a of_ordinal but what if I want to get the Bar from the value:

Bar b = Bar.of_value(1); // It should be Bar.ABC

Bar c = Bar.of_value(2); // Bar.BDF or Bar.EHG ??? Any should be valid

This is important because when reading protocols from the network, you may want to express the types with an enum. I know this is possible, but you will have to repeat it with each enum you use:

enum Bar : int(inline int val)
{
   UNK = 0,
   ABC = 1,
   BDF = 2,
   EHG = 2,
}

fn Bar getBarFromValue(int val) {
  switch(val) {
    case Bar.ABC: return Bar.ABC;
    case Bar.BDF: return Bar.BDF;
    default: return Bar.UNK;
  }
}
lerno commented 2 months ago

bool equal = Bar.BDF == Bar.EHG; // I would guess true....

No, this is not true. Unconverted it is a comparison of ordinals. This would be true though:bool equal = (int)Bar.BDF == (int)Bar.EHG;

Bar b = Bar.of_value(1);

I feel this has a very limited use. I don't mind writing a macro for it. It would need to look something like: macro @enum_by(#val, value) and it would just loop through the definitions. @enum_by(Bar.val, 1). This would work for all parameters, not just the inline one.

OdnetninI commented 2 months ago

No, this is not true. Unconverted it is a comparison of ordinals. This would be true though:bool equal = (int)Bar.BDF == (int)Bar.EHG;

Okay :+1:

I feel this has a very limited use. I don't mind writing a macro for it. It would need to look something like: macro @enum_by(#val, value) and it would just loop through the definitions. @enum_by(Bar.val, 1). This would work for all parameters, not just the inline one.

I was thinking more on a lookup table... but nvm. At least, let's add the macro, so the user have an easy way to convert from values to enums, and if they need performance, they can create the lookup table themselves.

lerno commented 2 months ago

Note that inline allows for it to inline to ANY sort of type:

enum Foo : (inline String val)
{
  ABC = "Hello",
  BCD = "World"
}
// ...
io::printfn("%s %s", (String)Foo.ABC, (String)Foo.BCD);

What this means is that the comparison could be arbitrarily complex or even undefined. So it's not clear that it is suitable for a lookup table.

OdnetninI commented 2 months ago

Yeah, I still had in my mind that enum are for integers only, but this is much more powerful.

Yeah, I think the current proposal could work fine for most cases.

lerno commented 2 months ago

I want to highlight something however, this doesn't change the problem it's supposed to fix, namely C interfaces.

// C interface
enum Foo { FOO_ABC = 123 };
void fooSomething(enum Foo);

"Rejected" C interface:

enum Foo : (int val) { ABC = 123 }
extern fn void fooSomething(int val);
// call
fooSomething(Foo.ABC.val);

Other rejected C interface:

distinct Foo = int;
const Foo FOO_ABC = 123;
extern fn void fooSomething(Foo val);

The actual proposal

enum Foo : (inline int val) { FOO_ABC = 123 }
extern fn void fooSomething(int val); // <- int, not Foo!

We can improve it:

distinct FooVal = int;
enum Foo : (inline FooVal val) { FOO_ABC = 123 }
extern fn void fooSomething(FooVal val);
lerno commented 2 months ago

So really, we come back full circle and failed to do anything. Rather, it leads to reconsideration of:

  1. A completely new type

    constset Foo : int
    {
    ABC = 123,
    }
    extern fn void fooSomething(Foo val);
  2. In site .val expansion:

    enum Foo : (int val) { ABC = 123 }
    extern fn void fooSomething(Foo.val x);
    // call
    fooSomething(ABC); // implicit .val
lerno commented 2 months ago

Other possible ways for expansion syntax:

extern fn void fooSomething(Foo x @expand(x.val));
extern fn void fooSomething(Foo x.val);
extern fn void fooSomething(Foo x @expand(val);
extern fn void fooSomething(Foo x -> x.val);
extern fn void fooSomething(Foo x @extern(x.val));
extern fn void fooSomething(Foo x @export(x.val));

Of course there is the explicit macro wrapper too.

lerno commented 2 months ago

There are probably other alternatives.

OdnetninI commented 2 months ago

A new type seems overkill.

As this is something intended for C interop, @expand(x.val) seems appropiate. But other options may be better.

Also, Foo.val seems good, but with that kind of syntax, I am worried people start using it outside C interfaces...

lerno commented 2 months ago

If it's used outside of C interfaces, that's probably bad, unless there is a good reason for it.

Hema2-official commented 2 months ago

If I understand correctly, the current implementation uses the type of the enum to store an identifier, and then it expands a virtual struct with the associated values. So this:

enum Channels : char (char size)
{
    AUTO = 0,
    RGB = 3,
    RGBA = 4
}

...acts similar (memory-wise) to something like this:

struct Channel @packed
{
    char id;    // identifier
    char size;  // associated value
}

So to set values to enum members, the requirements would be:

So why not just make an attribute that simply makes an enum act like the ones in C? And the attribute would require all members to have unique values. For example:

enum Channels : char @unique
{
    AUTO = 0,
    RGB = 3,
    YUV = 3
}

...would fail, since Channels.RGB and Channels.YUV have the same value. And for a less evil example:

enum Channels : char @unique
{
    AUTO = 0,
    RGB = 3,
    RGBA = 4
}

This would compile successfully, have the size of a char, and implicitly or explicitly convert to it. It would also be possible to then have a macro to convert a char value to an optional Channels. For example:

Channels! some_result = enumcast(6, Channels); // is a fault
Channels! other_result = enumcast(3, Channels); // is Channels.RGB

Tell me what you guys think. If this all sounds good, I'll lead the implementation if needed.

peace<3

lerno commented 2 months ago

Something like enum Channels : char (char size) is represented by a char. That is also its size. To get the size value, there is a (hidden) global array containing the values, so:

Channels x;
char y = x.size;

Is really:

char x;
char y = __Channels_size[x];

So then you understand that given that you have the size, to find the Channels entry, one needs to iterate over all entries in the __Channels_size array until a match is found. This can be trivially implemented as a macro, but it should not be a built-in feature as:

  1. Lookup cost is proportional to the number of elements in the array
  2. There are possible ambiguities that can only be resolved by selecting the first match
  3. As associated values may be any type, equals may not even be defined, or may be arbitrarily complex.
Hema2-official commented 2 months ago

Thanks for the explanation!

And I agree with you on your takes, they're kinda obvious. But what your arguments are against is not what I wanted to describe.

With @unique (or whatever), associated values wouldn't be possible. It would look like something along these lines:

enum Channels : char @unique
{
    AUTO = 0,
    RGB = 3,
    RGBA = 4
}

// =>

// effectively (when using the value):
distinct Channels = char; // 🤷‍♂️

In memory, we'd have to retain a value for every entry up until 4. So in C++ terms:

const int CHANNELS_COUNT = 5;

bool Channels[] = {
    true,
    false,
    false,
    true,
    true
};

bool isPresent(char index) {
    return (index < CHANNELS_COUNT) && Channels[index];
}

int main() {
    isPresent(0); // true
    isPresent(2); // false
    isPresent(4); // true
    isPresent(5); // false
    // etc...

    return 0;
}

And to be less of a memory hog:

const int CHANNELS_COUNT = 5;

long Channels = 0b10011;

bool isPresent(char index) {
    return (index < CHANNELS_COUNT) && (Channels & (1 >> index));
}

// same tests and results as before ...

Or if we want to retain identifier strings:

const int CHANNELS_COUNT = 5;

void* Channels[] = {
    (void*)"AUTO",
    NULL,
    NULL,
    (void*)"RGB",
    (void*)"RGBA"
};

bool isPresent(char index) {
    return (index < CHANNELS_COUNT) && Channels[index];
}

int main() {
    char index = 4;
    bool result = isPresent(index);
    if (result == true)
        cout << (char*)Channels[index];
    else
        cout << "That's a miss";

    return 0;
}

Same thing as the first but now, a NULL marks a fault for e.g. NONEXISTENT_ENUM_MEMBER and otherwise we can do .nameof or something...

This way, the performance hit is kept pretty low, especially for desktop systems. Still, 2 or 3 times C, but eh. Sadly, the 3rd example hasn't really heard of memory efficiency, and for OpenGL it would be about 1MB for just the lookup table, but this is ONLY if we want to retain identifiers. Otherwise, it's about 30KB.

So this solution:

  1. Discards associated values (unfortunately), but
  2. Enables optional reverse lookup without a significant performance hit
  3. Enables simple usage of the values, like in C

Thoughts? (Also, I'm starting to feel the complexity of this question and why most ideas are discarded...)

lerno commented 2 months ago

Did you think about the problem here that C enums also cover constants-that-are-bitmasks, and so to handle this yet another annotation is needed.

Hema2-official commented 2 months ago

Well, they would work, as the only requirement for my sketch is for the values to be unique, but yeah, it would be pretty inefficient... So yeah, no reverse lookup.

A new type seemed kinda overkill to me as well. Is there maybe another idea on how to bind constants to a parent? Because that's what the new type is about, isn't it? Some kinda static struct, perhaps?

(Sorry for the broken terminology btw.)

lerno commented 2 months ago

There is the sub-module approach:

module foo_bindings;
/* function and types here */
module foo_bindings::channels;

distinct Channels = int;
const Channels AUTO = 0;
const Channels RGB = 3;
const Channels RGBA = 4;

Now we can use it in this manner:

fn void main()
{
   Channels channel = channels::RGB;
   int a = 1 + (int)channel;
   Channels rgba = (Channels)a;
   assert(rgba == channels::RGBA)
}

Here the module name becomes the effective namespace.

TheOnlySilverClaw commented 1 month ago

If I may chime in here. I'm currently building a few C bindings with extensive use of defined values. And for that, I actually have several use cases for enums as a namespace with a mix of manually defined and auto-incrementing values.

Just one example from GLFW that would be wonderful if it worked like this:

enum KeyCode : CInt {
  UNKNOWN = -1,
  SPACE = 32,
  APOSTROPHE = 39,
  COMMA = 44,
  MINUS, // auto-incremented to 45
  PERIOD,  // auto-incremented to 46
  ...
}

The submodule approach is a usable workaround, but ends up being quite inconsistent when working with some values that fit an enum and others that don't.