c3lang / c3c

Compiler for the C3 language
https://c3-lang.org
GNU Lesser General Public License v3.0
3.03k stars 183 forks source link

Reconsider integer type names #1566

Closed nacaclanga closed 3 days ago

nacaclanga commented 1 month ago

The current names for integer types have the followring shortcomings:

  1. The names of the suffices do not match up with the names of the types. Uses would have to remember both sets indipendently.
  2. The long type in C does no necessarily have the same size as the long type in C3. E.g. on Windows C's long is 32 bit while C3's long is 64 bit. This could potentially create confusion when translating APIs. (Some other integer types have the same issue.). In contrast C has standartized type aliases in <stdint.h>, but these follow a systematic [u]intXX_t naming sheme.
  3. If a programmer want's to define type aliases that match the C FFI names these names are allready taken.
  4. The name char could also be confusing.

For this reason, one should think about whether it wouldn't be better to rename basic integer types to align more closely with their suffix names.

lerno commented 1 month ago

Please have a look at #1178

And please note that the exact (OS and Arch dependent) C types are available as CInt, CShort, CChar and so on.

In regards to the suffixes: it's possible that the bit sized suffixes disappear.

lerno commented 1 month ago

Can I close this?

HMart81 commented 1 month ago

IMO this should still be considered. Being able to alias is fine but on this particular language rules... sorry to say is not good imo...

What I wanted alias to be:

def byte = char; def s8 = ichar; def u8 = byte; def s32 = int; def u32 = uint; def u64 = ulong; def f32 = float; def f64 = double;

What I needed to do for example for the alias to compile... :(

def Uint8 = char; def Uint16 = ushort; def Sint32 = int; def Uint32 = uint; def Sint64 = long; def Uint64 = ulong; def Float32 = float; def Float64 = double;

I'm not only forced to start the alias with a upper case, extra key needed. I'm also forced to have at lest a single lower case letter there.

So s32 doesn't work and neither S32 works, only something like Si32 or Ui32 but that looks extremely ugly, so you are forced to do something like Sint32 just to make the alias work and look, a tiny, better, but is so much more typing, that I'm literally forced to use int instead and that is just not cool.

lerno commented 1 month ago

There is no way to make this happen unfortunately:

  1. Grammar is made unambiguous by the type name rules. This is not because I prefer any particular type style (in fact my favorite is foo_t), but rather because it is one of the simplest alternatives. If PascalCase wasn't used for types, it would have needed to be used both for globals, functions and variables. This would have been unacceptable for most people I think (no single letter variable names as indices for example). Making the grammar unambiguous is important for tools and also simplifies any compilers written for the language, as well as improves error messages. Removing this is not an option.
  2. The only type names that can violate the above are those that are keywords. Obviously an alias cannot introduce keywords without extreme complexity. Keywords also have to be as few as possible.
  3. This leads to the need to make a choice. Either use int and family, or use iXX or intXX schemes. Here the choice has been to retain the C names, but fix the sizes, in the established style of Java, C# and other languages that aren't C or C++.
  4. There is no consensus: some people like intXX, some like iXX some like sXX some like int etc.
  5. C names makes the language retain the flavour of C when looking at it.
HMart81 commented 1 month ago

"There is no way to make this happen unfortunately:"

Not the answer I wanted to hear but if you say there's no way, then there's no way, if you want to close this, personally for what is worth, I have no more objections.

lerno commented 1 month ago

@HMart81 If you have any solution to the points I list I'm always willing to consider them.

If you want aliases I suggest Int32 and UInt32 these have good readability and also matches aliases used by many sources prior to int32_t types. (Where people would not really use int32_t style as the _t suffix is reserved)

I've actually not encountered much of i32/u32 aliases in C code, but rather either the above Int32 and so on, or ULong and similar, where those types would be bitfixed. While s32 is okay to read i32 has remarkably poor readability in C code:

for (i32 i = 32; i > 16; i--) { ... }

Compare s32 which is not at all as bad:

for (s32 i = 32; i > 16; i--) { ... }

Initially the int128 type was called i128, but it had low readability.

In general, the placement of the type declaration is the problem for i32 style. Typically languages with i32 does not even use C style for loops. So they don't run into this visual ambiguity.

HMart81 commented 1 month ago

Unfortunately I don't have any solutions for those bullet points, besides again general comment's.

  1. Like you said the language case rules are a hard requirement, so there's no more discussion about it, at lest from my part.

  2. I also don't agree that the language should introduce a bunch of keywords, just to fit one or other coding style, but I do think this compact basic types, are a improvement over the old/normal C/C++ ones, even thou like you said, very few people seem to have used the particular ones I showed, myself I got used to them, by following and learning from Casey Muratori, from Handmade Hero (I'm self taught).

  3. Yeh agree, a final choice does need to be made, you are the one working on the language so ultimately the last vote is yours, thou personally I do tend for democracy and IMO a poll could be done for a certain alternative style (my vote is obvious...) and see what the current community votes on, Any future user will have to accept the outcome, and is best to do this earlier possible in the language life time. I will also have to accept if you decide to not change how this is now and close the issue.

  4. Agree on this and is hard to give a "solution", we all have different preferences, so again do the poll or you decide, you are the "Linus Torvalds" of C3 after all...hope you don't mind the comparation... ;)

  5. Again agree the C like names, are useful and should stay, they do make the language look more "C like", specially good to acclimate C programmers and help convert C code to C3 faster, but a single alternative set, could be introduced and I don't think it would increase the keyword count by that much, but I'm not the one working on the language to know that is true or not.

Enough said by my part, others should also reply and give a input.

nacaclanga commented 1 month ago

I kind of agree on what has been said here and I think you have put some fought into it. Name only types definatly do have the benefit of being more distinct from numberss in code. I would just leave two comments:

a) If you introduce type suffices they should be somehow consistent with your types. If you stick to you type namess I would use 0ic/0c, 0s/us, 0i/0ui, 0l/0ul, ... . These suffices definatly have the benefit of not using numbers again.

b) As HMart have said introducing aliases could be an option, but I do generally not see a big benefit if a codebase introduces aliases that uniquely map onta a builtin type.

c) I still would think about the name of the 8 bit type. Calling it "char" is of course close to what C is doing, but "byte" would be another option here, which also has the benefit of being free of any signed/unsigned association.

d) C23 introduced bit sized integer types. Do you plan on supporting that somehow as well?

e) Concerning the 128 bit type, you could also discuss going the other way and also give it a non-number based name like quad or something like that.

lerno commented 1 month ago

@HMart81 another aspect that one has to consider for the aliases is that in the end, some companies will settle on one or the other. I don't see people mixing int and iXX style, so that there will eventually be style guides advocating one variant or the other.

This will eventually lead to some set being used more than the other, like how people today adopt style guides of larger companies. In addition, it will cause friction between libraries using the iXX and int styles respectively.

So eventually the community would move towards a consensus. (Although whether this consensus is the best possible one is uncertain, given how disproportionate some influences will be).

So one idea would be to throw out both possibilities and then just see what happens. Unfortunately this would mean that the second set is then around for a very long time for "historical reasons", and I think that is something I really want to avoid.

In the end, a programming language is a compromise, but also a set of active choices by the designer(s). For example, one deliberate design in C3 is to nerf the macro and compile time in favour of IDE-friendliness and programmer readability.

So rather than having macros that can do anything, and then say "oh, don't use this part of the functionality if you want it to work well in an IDE and be readable", C3 just removes those.

And that's an active curation of features. To not curate features and "let everyone use what they want" is to actually make it less useful (in the macro system case, IDEs cannot assume the user will use the safe subset, and so must add much complexity to handle such possible uses, not can the user make assumptions about what a macro does. Both of these are actual costs due to "allowing anything")

In this particular case, the easy thing would be to add the aliases. But then this would actually be a non-decision. And just like in the macro case there would be a cost. There would be a cost for users reading code from "the other basic type convention", there would be a cost in learning the language "what is the difference between i64 and long", "oh, there is a long so it must be like a C long or?", and there would be a cost in just having to decide what convention to follow.

When I set out to write C3, I wasn't really clear about the above. It's something I've learned over time, that it's really important to curate things. And sometimes it's not even important to make the "right" decision. The important thing is to make a decision. It's often very tempting to avoid the decisions.

lerno commented 1 month ago

@nacaclanga I did (re)introduce U/UL/L suffixes in order to possibly later remove the iXX/uXX suffixes. Interestingly, u/i8 and u/i16 suffixes are pretty much useless. It made more sense back when C3 (for a while) had untyped literals. This is because C3 allows implicit narrowing of constants anyway. So char x = 88 * 1; is completely fine even though the 88 * 1 expression is int. So the only case when you really want a suffix, is when you want to define a constant of a size greater than an int or it should be unsigned. So U/UL/L covers those partly. Then there is the int128/uint128 one, which isn't covered. But I've been thinking about it. (And if one wants a char literal for passing into a macro to infer the right size or something, there is always (char)1 and such.

In regards to char vs byte. Java of course uses "byte" as do a few other languages, but I wanted to retain that flavour of C. Also "byte" isn't a great type name as it is often desired as a variable name. chars looks more distinct from char than bytes from byte in my tests.

Regarding bit sized types. The main point of those in C is to exclude them from normal promotion rules. I understand where this desire comes from, but they end up having a lot of odd behaviour due to this. If I would include them in C3, people would assume they should be used, which they shouldn't be in the normal case due to their semantics.

So no, I don't see myself adding them.

Finally, regarding int128, I have considered various names, but none ended up really good, so int128 has stayed the (temporary) name. Quad (also a possible float128 name), cent, huge, oct and others. I had difficulties finding an established name.

lerno commented 2 weeks ago

Is there anything more to add to this discussion or can I close it?

lerno commented 3 days ago

Since nothing has been added for two weeks I'll close this.