Documentation should use more specific types

matthijskooijman commented 8 years ago

Currently, the documention uses types like "unsigned long" (for example here: https://www.arduino.cc/en/Reference/millis). However, this is only correct for the AVR architecture, on e.g. the Due a long is (AFAIK) 64 bit, but the code explicity uses uint32_t. It would make sense to adopt the same explicit approach in the documentation (and, while we're at it, probably apply this to the code of the AVR port as well).

agdl commented 8 years ago

@matthijskooijman +1 I think that actually all the documentation and software should use standard data types like uint8_t, int32_t etc. In this way it is immediately clear the "dimension" of a variable and code can be optimized and standardized

agdl commented 8 years ago

BTW once upon a time someone from the high levels told me that for beginners standard data types are not too friendly...

matthijskooijman commented 8 years ago

BTW once upon a time someone from the high levels told me that for beginners standard data types are not too friendly...

They might be a bit scary-looking, that's true, but the using platform-dependent types of unknown size looks nicer but doesn't work as well. We could consider introducing more friendly alternative names, but then Arduino code becomes even less portable, and things people learn will also be less reusable.

Perhaps a decent alternative is to use the standard names in the docs, but make them a link to a page explaining what these types mean? Anyone that is confused about a typename can click it and get an explanation?

agdl commented 8 years ago

I think we need @tigoe

tigoe commented 8 years ago

I would rather we adopt the Arduino types to new platforms as we go. As you said, Arturo, the point in making them in the first place was to make the language more readable for a non-technical audience. I'd like to maintain that approach as we move to new platforms.

matthijskooijman commented 8 years ago

@tigoe, what "Arduino types" do you mean exactly? There is the custom "byte" type, but AFAIK no other custom integer types are defined. The AVR core and documentation use int and long for the bigger integers, but those can't be portably used on other platforms, since their sizes will be different.

Or is your suggestion to introduce new types (e.g. word and dword in addition to byte or something)?

tigoe commented 8 years ago

I mean the ones listed here, as used in Arduino sketches over the last ten years: https://www.arduino.cc/en/Reference/HomePage

I'm suggesting that if they don't exist for other platforms, we introduce them, to make those platforms compatible with Arduino as it currently exists.

matthijskooijman commented 8 years ago

@tigoe, Oh good point. Seems I totally missed the short and word types (though "word" is a bit of an ambiguous term in general - I think it originally referred to a machine's word size, for x86 it was used to refer to 16-bit, with double-word being 64-bit, arduino seems to use it for 16-bit on AVR and 32-bit on Due, which doesn't actually make it very useful here). On additional complication with these types is that currently short is signed and word is unsigned, and I don't think C supports just adding "signed" or "unsigned" to a custom type to change its signedness.

So, regarding types, it seems we don't have any Arduino-specific / user-friendly set of types that is complete (e.g. has signed and unsigned integers of 8, 16, 32 and possibly 64 bits wide) and portable. So that leaves us with two options for making this consistent:

Use the existing, standard types like uint32_t.
Introduce new types to make the existing types complete.

I'm not so sure what the best option is here. Using standard types makes code, as well as learned concepts, more portable. These types also clearly and unambiguously indicate the signedness and size of the type, at least once you've learned how they work. I'm inclined to stick with the standard types, also because it is probably hard to define a new set of typenames that are clear and unambiguous enough. "byte" is well-defined, "short" is less obvious and "word" is both unobvious as well as ambiguous.

For lack of any well-defined names for specific bit widths, using the number of bits in the type (like the standard types) might be useful. To make the names less cryptic, they could be more verbose, an drop the not-so-useful _t postfix. E.g. types like signed_16bit_int or unsigned_8bit_int could work, though I'm afraid they're so long and verbose nobody will want to type them...

tigoe commented 8 years ago

I'd be more inclined to introduce new typws to make the existing types complete. I have always found the _t extension irritating, and uint8, uint16, uint32, etc, too terse, and too easy to mis-type as unit (not to mention how auto-correct does the same to them).

As for being so long and verbose, nobody wanting to type them: that hasn't been my experience, outside of people who've been classically trained in C programming. Tersness is a weakness, in my opinion. It's one of the things I'm trying to fight against, to get people to use natural language more in code, so it's readable to people from many backgrounds. If we really want programming to be a common literacy, then the grammar of it has to be articulate.

Breidenbach commented 8 years ago

Oooh - back to COBOL! :-)

On Feb 5, 2016, at 8:49 AM, tigoe notifications@github.com wrote:

I'd be more inclined to introduce new typws to make the existing types complete. I have always found the _t extension irritating, and uint8, uint16, uint32, etc, too terse, and too easy to mis-type as unit (not to mention how auto-correct does the same to them).

As for being so long and verbose, nobody wanting to type them: that hasn't been my experience, outside of people who've been classically trained in C programming. Tersness is a weakness, in my opinion. It's one of the things I'm trying to fight against, to get people to use natural language more in code, so it's readable to people from many backgrounds. If we really want programming to be a common literacy, then the grammar of it has to be articulate.

— Reply to this email directly or view it on GitHub https://github.com/arduino/Arduino/issues/4525#issuecomment-180366467.

cousteaulecommandant commented 8 years ago

I agree that the [u]intX_t types are a bit cryptic and unnecessarily scary. I have seen implementations that use custom types such as u8 or s32 for unsigned/signed types; maybe that's excessively short though. As for verbosity, what about unsigned8 or signed_32b? (not sure if it looks better with or without the underscore, and with or without the b; I think without either)

As for the documentation, I'd just say "a 32-bit signed integer" when necessary.

re: the original comment, I'm not sure a long is 64 bits in ARM; according to http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0472l/chr1359125009502.html it is 32 bits just like int (the only platform I've heard about that uses 64 bit longs is x86_64/amd64).

tigoe commented 8 years ago

I think @damellis should weigh in on this, as he's more informed than I when it comes to the choices for data type names originally.

q2dg commented 8 years ago

Well...six months ago...https://github.com/arduino/Arduino/issues/3801

shiftleftplusone commented 8 years ago

about cousteaulecommandant's and tigoe's statements:

the C standard data types listed in < stdint.h > are int8_t, int16_t, int32_t, int_64_t, uint8_t, uint16_t, uint32_t, uint64_t, (...), and Arduino should apply to the standards!! why establish a non-standard-C-compliant datatype just for Arduino? C is C is C, and either C user who is learning C for Arduino should learn to use the correct rules for standard C from the start - or one day he will fail terrificly in "real world" environments (e.g., gpp on Raspberry Pi) !!

cousteaulecommandant commented 8 years ago

@shiftleftplusone Well, first of all Arduino is based in C++ and not C. Afaik the [u]intNt types exist in C++ too, but I wanted to clarify this. Second, defining new types is not "non-standard-C[++]-compliant"_; nowhere in the standards does it say that you can't define your own types. It's just non-standard, not non-standard-compilant. Third, I think the reason standard types are not often used in Arduino is the fact that the Arduino spirit is to be newbie-friendly: byte is used instead of unsigned char, because everybody understands what a byte is but the concept of a "character without a sign" is weird; word instead of unsigned int because it kinda looks prettier, and boolean instead of bool because, uh... well, honestly I don't get it either. What we were discussing was whether a type like uint32_t is consistent with the rest of Arduino types and functions.

From what I have seen, Arduino is not intended as a tool to teach C or C++ and their standard libraries but rather to teach programming in general and to be "easy to use". For example, it introduces setup() and loop() instead of just using the standard main(); also it uses a lot of macros and functions and names for simple things that would be a one-liner in C++ but would look a bit ugly and would require some "advanced" (not really) knowledge of "C/C++". In general, it tries to hide the "ugly details" from the user, such as how to handle AVR's registers, how to use the timers, things such as the fact that signed overflow in actual C and C++ causes "undefined behavior" (which btw also does in Arduino but the docs say it doesn't)...

For really learning C and how to use microcontrollers, the Arduino IDE is probably not the best choice.

PS: sorry for wall of text. I really need to learn to summarize.

shiftleftplusone commented 8 years ago

Objection! Arduino IS helpful for learning C-programming, as it uses a gpp C(++) compiler and C/C++ syntax, and actually it's the more helpful for learning the more it will comply to standards! So it should be C compliant to C standards - as much as possible - and not always cook it's own soups especially where there is absolutely no need for, e.g. for standard C(++) data types, period.

tigoe commented 8 years ago

@cousteaulecommandant pretty much sums up our philosophy in a nutshell. I don't think ANSI C is such a perfect standard that it must be adhered to at the expense of legibility. People can learn the standards after they've learned the more basic concepts, and so far, Arduino's proven pretty good for teaching those basic concepts, in part because it doesn't adhere to the standards. If you want to use the standards in your own examples, go ahead. They'll compile just fine.

techpaul commented 8 years ago

Once you have trained someone with bad habits it takes longer to unlearn the bad habits, even with someone leaning over you watching every key press.

I am regularly involved with learners fro age 14 to 30, bad habits and wrong ways stick until they waste a day tracking down an issue which was their bad habit, no matter how many times you tell them, show them, smack them around the head they keep on.

Don't even get me on exam boards and bad habits.....

q2dg commented 8 years ago

I don't mind if Arduino types definition must follow a standard or not...but please: give it some consistency. As I said in #3801:

*"word" type is a 2-byte data type only in AVR boards but in Due (and Zero??) it is a 4-byte data type. Not very logical.

*"char" type in a signed type in AVR boards but in Due (and Zero??) is unsigned. OMG!!

*I repeat here: maybe should be added a table somewhere in web simply showing three columns for each data type (row): the bytes stored in both architectures (AVR & ARM) if it's signed or not and their resepective numerical boundaries. This would help to see, for instance, that in Due (and Zero??) the "word" type and the "unsigned long" type are the same and also the "int" type and the "long" type, whereas in AVR the "short" type is equal to "int" type. What a mess.

Thanks

shiftleftplusone commented 8 years ago

apart from logic or not it is a matter of libs design if there will happen a lot of mess or not. It's quite common that int, word, long are ambiguous and vary depending on the platform type and the processor address bus width (sorry, correct English idiom?).

But e.g., if a AVR lib is designed to handle char as signed then bullshit can be reliably expected to happen if the same code will be compiled to unsigned char when targeting an ARM. Or if the size of a a parameter is essential for functions handling with sizeof(datatype). Or when dereferencing a pointer to an int value.

So for variable definitions (at user interface level) working on either platforms unambiguous datatype dimensions are essential, i.e. the C11 stdint datatypes. Simply re-define in all existing libs former int as int16_t or former long as int32_t, and everything will be fine.

Ok, perhaps not everything , but most of it. (char for Serial class is another issue... )

techpaul commented 8 years ago

char/unsigned char for nearly any 8 bit interface, LCD, SPI, I2C/TWI.....

The main issue is the Arduino documentation does not say like C/C++ that these types may vary across platforms, FOR EXAMPLE it states on https://www.arduino.cc/en/Reference/Char

"The char datatype is a signed type, meaning that it encodes numbers from -128 to 127. For an unsigned, one-byte (8 bit) data type, use the byte data type. "

No platform dependency it states it is ALWAYS this when it is not

shiftleftplusone commented 8 years ago

I don't see your point - who is missing something? (or are you just confirming what I already wrote?) actually char IS platform dependent, as already statet (char is unsigned for ARM cpus like the DUE), and so it's ambigous! So only int8_t and uint8_t are well-defined and un-ambiguous and should supersede the vintage data type names.

techpaul commented 8 years ago

You and I know that char is platform dependent new users and many others do NOT so

either - Leave the current DOCUMENTATION mess

or - Change compiler settings to match documentation for ALL platforms

or - Change documentation to say exactly what each type is for EACH platform

or - Change types to a standard set that are EXPLAINED to new users and DOCUMENTED

Current documentation mess is STILL AVR specific, and most neweer platforms are ARM, so many gotchas for new users and new hobbyists let alone students.

shiftleftplusone commented 8 years ago

ok, thank you for clarification, your point was not clear to me. I fully agree. Nevertheless: compiler settings are not supposed to be able to be changed - the gnu compiler for AVR works different from the ARM compiler, also for ARM on Raspi boards it's like ARM on Arduino. So it's a matter of ducumentation and standardization to C11 stdint standards.

cousteaulecommandant commented 8 years ago

maybe should be added a table somewhere in web simply showing three columns for each data type

Yes please, let's do this.

it states on https://www.arduino.cc/en/Reference/Char: "The char datatype is a signed type..."

Bug. Either a "documentation bug" or an Arduino bug, you choose. Either fix the documentation or use -fsigned-char (or -funsigned-char and make char always unsigned). Although I prefer things to be homogeneous I don't think overriding the ARM standard ABI specification is a good idea. (Also if we do this we could go further and force int to be 16 bits on ARM, but I don't think this is possible in GCC.)

(1) either - Leave the current DOCUMENTATION mess (2) or - Change compiler settings to match documentation for ALL platforms (3) or - Change documentation to say exactly what each type is for EACH platform (4) or - Change types to a standard set that are EXPLAINED to new users and DOCUMENTED

(1) definitely no, yuck. (2) doesn't seem possible. You can change the signedness of char (and violate the ABI standard), but it doesn't seem that you can change the size of int. It might be an interesting possibility though. (3) Sounds boring and confusing but it might be the best option. (4) I don't think this is necessary if functions are consistent in the usage of types. byte for unsigned char, char when actual characters are involved, short and long when the length matters (with unsigned when needed), and int when it doesn't. I'd get rid of word since it doesn't provide anything but confusion, but I'd keep byte. I'd discourage using char for anything other than characters (there's byte for that, and if you NEED negatives you should use signed char anyway). As a last resort, I guess [u]intX_t could be used, but only if REALLY necessary. I don't think this will ever be the case since byte / short / long / long long already have the required lengths (afaik long is 32 bits in ARM as well).

techpaul commented 8 years ago

cousteau wrote:

maybe should be added a table somewhere in web simply showing three
columns for each data type

Yes please, let's do this.

Agreed ...

(1) either - Leave the current DOCUMENTATION mess
(2) or - Change compiler settings to match documentation for ALL
platforms
(3) or - Change documentation to say exactly what each type is for
EACH platform
(4) or - Change types to a standard set that are EXPLAINED to new
users and DOCUMENTED

(1) definitely no, yuck.

Hence various people's comments

(2) doesn't seem possible. You can change the signedness of char (and violate the ABI standard), but it doesn't seem that you can change the size of int. It might be an interesting possibility though.

Might do, but the compiler for each has been set to a standard for norms and limits, to get where we are for various reasons. In some cases like char they may well have chosen unsigned as default for a very simple reason, the majority of char uses for ARM is as a RAW data byte. When transferring bytes via Serial, SPI, I2C/TWI, SD Card, USB, even when we talk to LCD, it is RAW data, any meaning depends on translation at both ends and in quite a few cases the individual bytes are codes not numbers so need to be unsigned.

(3) Sounds boring and confusing but it might be the best option.

But is the table method you agreed with up top, if you make a table you must refer to it from all the other places. It is what each ABI and compiler setup has to do for each platform.

(4) I don't think this is necessary if functions are consistent in the usage of types. |byte| for unsigned char, |char| when actual characters are involved, |short| and |long| when the length matters (with

Remembering that char is SIGNED on AVR .. Strings should never really be considered signed especially if you have to support UTF-8, strings are codes not numbers.

|unsigned| when needed), and |int| when it doesn't. I'd get rid of |word| since it doesn't provide anything but confusion, but I'd keep |byte|. I'd discourage using |char| for anything other than /characters/ (there's |byte| for that, and if you NEED negatives you should use |signed char| anyway).

char is signed on AVR not on ARM there is the nub of the problem.

As a last resort, I guess |[u]intX_t| could be used, but only if REALLY necessary. I don't think this will ever be the case since |byte / short / long / long long| already have the required lengths (afaik |long| is 32 bits in ARM as well).

If you create a cross platform library you need to be sure what you are doing is defined as will work on all platforms LiquidCrystal, Servo and others are like that. Basically in those cases you need to use [u]intX_t to be sure.

cousteaulecommandant commented 8 years ago

If you create a cross platform library you need to be sure what you are doing is defined as will work on all platforms LiquidCrystal, Servo and others are like that. Basically in those cases you need to use [u]intX_t to be sure.

Sure, internally I'd use those types, but externally (function parameter and return types) I'd prefer to go with "normal" types as the thing the end user will have to deal with.

BTW, I wasn't thinking of Unicode nor UTF-8 for characters, just of "smallest unit of 'text' in a string" (you can't do UTF-8 based indexing or accessing on strings anyway). Maybe the right words would have been "piece of a string", to emphasize that I don't refer to raw data in general but to a specific usage (elements of text strings). (Also UTF-8 strings could be considered signed, and you could say that "positive chars are ASCII and negative chars are part of UTF-8 multibyte sequences"; in any case UTF-8 chars don't directly represent Unicode code points numerically.)

matthijskooijman commented 8 years ago

I don't think this will ever be the case since byte / short / long / long long already have the required lengths (afaik long is 32 bits in ARM as well).

The length of short, long and long long are implementation-defined, though I believe that on our current two architectures, AVR and ARM, they are the same size. I'm not sure if that's good enough to rely on though, since there might be other architectures in the future, and there are already third-party architectures right now.

cousteaulecommandant commented 8 years ago

Indeed, for example long is 64 bits on amd64. However in ARM it's 32 bits. While I don't think that Arduino will ever use the amd64 architecture, maybe AArch64 (64-bit ARM) is a possibility. I wasn't able to figure out how long is a long there; apparently there are multiple possible ABIs so long could be either 32 or 64 bits. If Arduino ever moves to said architecture or another one where long is 64 bits, then what I said in my earlier comment loses its validity.

(And it will be even funnier if it moves to weirder architectures that don't have 8-bit bytes, such as some TI microcontrollers, and therefore no uint8_t type. I imagine these microcontrollers are out of the scope of Arduino though.)

techpaul commented 8 years ago

cousteau wrote:

If you create a cross platform library you need to be sure what you
are doing is defined as will work on all platforms LiquidCrystal,
Servo and others are like that. Basically in those cases you need to
use [u]intX_t to be sure.
Sure, /internally/ I'd use those types, but /externally/ (function parameter and return types) I'd prefer to go with "normal" types as the thing the end user will have to deal with.

Well there you have not thought it through, if code calling a library is expecting int to be 16 bit returned or signed char or other variation you have the SAME platform variation issues so the CALLING and RETURNING have to either be a SMALL subset of types or specific types that will work across ALL platforms. Simpler to use standard types.

Due to platform differences there is very little you can do with NORMAL types as anyone involved in transferring data between systems will tell you. From size, type, byte order and many other issues.

BTW, I wasn't thinking of Unicode nor UTF-8 for characters, just of "smallest unit of 'text' in a string" (you can't do UTF-8 based indexing or accessing on strings anyway). Maybe the right words would have been "piece of a string", to emphasize that I don't refer to raw data in

Yes you can UTF-8 is 8 bits long so takes up ONE byte so indexing on a string containing UTF-8 characters (not double width - 16 bits ) is just the same as unsigned char. Strings are RAW data, even the characters are RAW data as CODES not numbers whatever encoding is used (ASCII/ANSI/ISO/UTF).

Anyone using the actual ASCII/ANSI/UTF-8 value as an index should not have a problem as arrays (strings are arrays) use signed integers as indexes.

general but to a specific usage (elements of text strings). (Also UTF-8 strings could be considered signed, and you could say that "positive chars are ASCII and negative chars are part of UTF-8 multibyte sequences"; in any case UTF-8 chars don't directly represent Unicode code points numerically.)

Only fools take raw data and treat as signed numbers, bit like the amount of silly people who set up databases for Phone numbers and use a number instead of string for storing the number. Telephone numbers are a CODE not a number you do maths on.

A number is what you do maths like adding VAT to, anything else is just raw data. What is the Square Root of ASCII cod for 'A' ?

cousteaulecommandant commented 8 years ago

No, UTF-8 is not 8 bits long, it's variable width; it maps a single Unicode code point to a sequence of one or more 8-bit bytes, hence the difficulty of indexing a UTF-8 string character-wise. (compare to other encodings such as iso-8859-1 or latin1 where each character is exactly 1 byte long; in these cases a character would be indeed close to an uint8_t semantically) Also, C strings aren't really raw data since the \0 is special, so they cannot store arbitrary raw data and usually store text (characters with some encoding), although this is technically not mandatory. This isn't the case for C++ std::string I think, which is just a vector of chars which can include a sequence of any length of ANY byte value. And I have no idea what Arduino Strings are. So it depends who you ask about "strings". I didn't say strings are "numbers" but sequences of numbers (or symbols with numeric codes), just like phone numbers (silly name) are sequences of numbers from 0 to 9. If we get technical raw data isn't signed or unsigned, just a stream of bits with no other meaning than what we give to it. However C and C++ treat "chars" as numbers (signed or unsigned depending on the platform), but my point (and I think yours too) was that this is just a coincidence and they shouldn't be considered as numbers, just as "chars". A byte could also be seen as the numeric code (from 0 to 255) representing its bit value, in which case I think the word "byte" is better suited than "unsigned char". Consider it short for "unsigned integer the size of a byte" or "uint_byte_t" (i.e. uint8_t when 1 byte is 8 bits). Maybe seeing a byte (8 bits) as an unsigned number doesn't make a lot of sense but I think it's the type that makes the most, that's why I think the byte type is fine. "signed char" definitely makes no sense semantically, just like "unsigned char", but there's no special name for it in Arduino (well, aside from int8_t) and I don't think there needs to be.

Re: functions, when I said internally I meant for local variables, not to declare the function as taking an int16_t but calling it with a short int; that'd be nonsense. Just like putchar() takes an int argument but pushes a byte (so it probably uses unsigned char internally). And if [u]intX_t REALLY needs to be the parameter/return type, then use that; but I think it usually doesn't have to.

q2dg commented 8 years ago

Is so difficult to put in web a table showing the actual size and sign of all Arduino's defined data types in the three (AVR,ARM,x86) present architectures?

2016-02-23 1:34 GMT+01:00 cousteau notifications@github.com:

No, UTF-8 is not 8 bits long, it's variable width; it maps a single Unicode code point to a sequence of one or more 8-bit bytes, hence the difficulty of indexing a UTF-8 string character-wise. (compare to other encodings such as iso-8859-1 or latin1 where each char is exactly 1 byte long; in these cases a character would be indeed close to an uint8_t semantically) Also, C strings aren't really raw data since the \0 is special, so they cannot store arbitrary raw data and usually store text (characters with some encoding), although this is technically not mandatory. This isn't the case for C++ std::string I think, which is just a vector of chars which can include a sequence of any length of ANY byte value. And I have no idea what Arduino Strings are. So it depends who you ask about "strings". I didn't say strings are "numbers" but sequences of numbers (or symbols representing numeric codes), just like phone numbers (silly name) are sequences of numbers from 0 to 9. If we get technical raw data isn't signed or unsigned, just a stream of bits with no other meaning than what we give to it. However C and C++ treat "chars" as numbers (signed or unsigned depending on the platform), but my point (and I think yours too) was that this is just a coincidence and they shouldn't be considered as numbers, just as "chars". A byte could also be seen as the numeric code (from 0 to 255) representing its bit value, in which case I think the word "byte" is better suited than "unsigned char". Consider it short for "unsigned integer the size of a byte" or "uint_byte_t" (i.e. uint8_t when 1 byte is 8 bits). Maybe seeing a byte (8 bits) as an unsigned number doesn't make a lot of sense but I think it's the type that makes the most, that's why I think the byte type is fine. "signed char" definitely makes no sense semantically, just like "unsigned char", but there's no special name for it in Arduino (well, aside from int8_t) and I don't think there needs to be.

— Reply to this email directly or view it on GitHub https://github.com/arduino/Arduino/issues/4525#issuecomment-187452976.

shiftleftplusone commented 8 years ago

a table would not resolve the problem because one always would be forced to look up what is which - annoying and confusing having to handle code which one has to write and to compile once for AVR and then after having rewritten it anew for ARM !

simply use unambiguous data types as already world-wide defined in < stdint. h > over all in either lib and either documentation, then every reader will see at the 1st sight what it's all about, so what the heck is the problem?

A cross-reference table would not be bad additionally, of course, for vintage definitions reasons.

q2dg commented 8 years ago

Yes, it's annoying but, reallistically...I think it's the most pragmatical and fast solution. With more than 600 issues and 100 PR pending, I don't have many hope in such a big change as you propose could be done any time soon (if ever).

2016-02-23 17:05 GMT+01:00 shiftleftplusone notifications@github.com:

a table would not resolve the problem because one always would be forced to look up whatis which - annoying !

simply use unambigous data types over all in either lib and either documentaion, then every reader will se at the 1st sight what it's all about, so what's the problem?

A cross-reference table would not be bad additionally, of course, for vintage definitions reasons.

— Reply to this email directly or view it on GitHub https://github.com/arduino/Arduino/issues/4525#issuecomment-187765019.

shiftleftplusone commented 8 years ago

the issue will not be discussed away having a look-up table because then you will have to rewrite every code anew for either ARM platform nevertheless! Only redefining the interfacing variables in the libs (and updating the docs) will result in cross-platform compatibility!

Apply to international standards, and you'll be fine, leave this solid ground, and you're lost in space!

bperrybap commented 8 years ago

For anyone suggesting always using "normal" or native types instead of using types that are deterministic across platforms, as @techpaul stated this is simply not workable. There are times when you really do need variables to be of an exact fixed size and in those cases, you also need that to be consistent across all processor/architecture platforms. By using "normal" / native types, it would require the sketch or library code to know what platform it is running on so that it could determine the appropriate "normal" type. i.e. it might use int on AVR but then have to change that to unsigned short on ARM, or pic32. This is a total mess and for that reason should not even be considered. If size matters, then a portable size specific non-native type should be used.

For those people considering defining new Arduino proprietary types instead of using the types in , I would encourage you to make sure to consider a few things.

1) is it truly really better or is it just different?

2) What is the impact of this change in terms of support and breaking any existing code or future backward compatibility.

3) what is the effort involved for each path?

My opinions on these high level questions follow:

For number 1, I would argue that creating new "Arduino" type names is just being different vs actually being better. In the case of a non technical person, they have to learn something new either way. Does it really make any difference if that means learning and using, uint32_t or some new "Arduino Friendly" name like "unsigned_32b" or "unsigned32"? In reality, all 3 are just a name. However, the types are a true standard, already exist, and have existed for many years. Not only that but you have MANY existing sketches and libraries (including IDE core libraries) already using the types in . So even if the effort was put into creating new Arduino types, users will still be continually running to the types from To push for creating new types instead of simply promoting the use of means pushing for a situation where you will start to have multiple types in use that mean the the same thing. So if new types are defined, users will have to learn both vs just the types in since they will be encountering both.

For number 2, I would argue that creating new types creates compatibility issues. It will mean that any sketches or libraries written using these new types will only work on the first IDE that defines them and any IDE moving forward. This definitely draws a line in the sand and will create additional support issues and headaches. Plus, do we really think that many, if any, existing libraries or sketches that were using types will go back and modify their code to use these new types? In other words, if a sketch or library uses types in it works on any IDE version vs if the new types were used, the code will only work on newer IDEs that define these new types. What is the true incentive for ever using the new types? especially given that using them comes with a backward compatibility downside.

For number 3, I would argue that is much simpler to go down the path of simply promoting/advertising/documenting the types path. The main reason is that these types have always existed in the Arduino environment so it isn't really having to create anything new. It is simply documenting what is already there. It could be as simple as updating some existing documentation and creating new documentation for 8 data types. (uint8_t, int8_t, uint16_t, int16_t, uint32_t, int32_t, uint64_t, int64_t) The biggest advantage is that by moving toward the types in you avoid any compatibility and support issues since these types already exist and are already in use. The issue reduces to a documentation issue when using types.

Because of these 3 items, I would really push back extremely hard against defining new types. To me, creating new types is for the most part all downside:

Creating new types vs using types seems to be just different vs actually better.
Creating new types vs using types is more work
Creating new types vs using types creates compatibility & support issues
Users will still have to learn types as they are already in wide spread use

Where is the real upside for creating new types?

shiftleftplusone commented 8 years ago

I fully agree.

matthijskooijman commented 8 years ago

@bperrybap, thanks for your detailed writeup, seems I can't find any fault in your reasoning. Especially the compatibility argument is compelling to me, so I agree that using the stdint types is the way forward. I'm curious to hear what @tigoe and @cmaglie think, though.

tigoe commented 8 years ago

@matthijskooijman You've heard my opinion on this plenty of times, I think. It's unchanged.

q2dg commented 8 years ago

I don't mind if the types must be Arduino-like or stdio-like but, please, make them consistent (in bit lenght and in sign): I would like that the meaning of "byte", "char", "int", "long", etc is the same whichever board I'm using.

cousteaulecommandant commented 8 years ago

I would like that the meaning of ~~"byte"~~, "char", "int", "long", etc is the same whichever board I'm using.

Well, if you meant that literally you're out of luck, since apparently each microcontroller has its own ABI defining the lengths, and the compiler Arduino uses respects that. An int must be 16 bits in AVR and it must be 32 bits in ARM, and as far as I know this cannot be changed. Also a char is signed in AVR's ABI and unsigned in ARM's, although in this case GCC allows changing that with -f[un]signed-char. The actual meaning of int is "an integer type that will be fast to operate and at least 16 bits large", and that's how one is supposed to use it.

So the options are: either use the standard types and live with the fact that their size will change (you can always use [u]int#_t if that's a problem), or create new types like it's already done with byte and word (we could call them Short, Int, Long, etc). Modifying the length of int doesn't seem to be an option.

As a side note, this issue only affects char and [unsigned] int. short and long are 16 and 32 bits respectively for both platforms, at least for now.

bperrybap commented 8 years ago

@tigoe I really don't understand your push for this. Why would you push for a solution that seems to be more work, creates additional support & compatibility issues, and still has to live side by side with using the existing standard types in ? (use of standard types will not be going away)

One explanation I can envision is that by creating these new types, it starts to create a vendor lock in for Arduino. i.e. for any new code that moves down the path of using these new type names, it becomes non portable and harder to move to other potentially competing platforms. Is non portability and Arduino lockin a desired goal?

Another reason I could envision is that it creates an intentional incompatibility in an attempt to force users to upgrade to a newer version of the IDE since those who don't will start to encounter increasingly incompatible code that is using the new types. It ends up being a sneaky way to force users to track more lock step with the latest releases.

I view the creation of new Arduino proprietary data types as creating unnecessary fragmentation.

From my perspective, I'd rather see the limited resources work on other things.

Perhaps you could further elaborate on your views as to why creating new types is worth the additional effort and compatibility issue downsides?

@cousteaulecommandant actually char is a particularity tricky type. The real issue is when using char for something other than a character like in math expressions. Using a char that way is actually misusing the type. char is not really an integer and C actually has 3 distinct "char" types.

char
unsigned char
signed char

All three are distinct types and are handled differently by the compiler. (There is a long, complex, and sorted history as to why this is) So even though the compiler may treat a char as signed or unsigned by default, it is still not the same as the corresponding unsigned char or signed char type and that is why you should never use char for anything other than characters. In most cases the compiler will generate the same code for the char type with the corresponding sign. However, that is not always the case. In most cases, using char and depending on the auto conversion to an integer type will work. It may not generate as optimized code as the true integer type. And a in a few cases I've seen it break "working" code. After lots of discussion(arguing) with the avr-gcc developers, I relinquished since they could hide by some corner cases that are "undefined" behavior in the C standard. i.e there are some particular coding sequence use cases that are undefined in the standard that "fail" on the AVR since technically the code was using the variable in way that is undefined in the C standard. Most of these issues relate to expressions depending on 7/8 bit rollover (overflow), The compiler doesn't have to behave as what would be expected in these cases, and it sometimes doesn't since behavior after overflow is undefined. i.e. you can end up it creating a 16 bit value for comparison instead of truncating it to the remaining lower 8 bits after an overflow. i.e. I've seen cases where you can get a value larger than 8 bits from an "8 bit" data type on the AVR (I have never seen this on other processors) This unintended/unexpected behavior can break expressions, calculations, loop variables, etc.... In my view it should always work as people would expect it to work as the C standard allows that behavior and more importantly it should be consistent regardless of which way the type was declared. As it is now, it works the way people expect "most of the time". In my view, the compiler was optimizing incorrectly to create an unexpected behavior. (The C standard says to lean towards expected behaviors when possible in these situations) They came back with the behavior for that sequence is undefined and so the compiler is allowed to do any behavior it wants, including being inconsistent across similar data types. I lost.....

Yet another reason to always use the types as uint8_t and int8_t for 8 bit variables that are not actual characters.

cousteaulecommandant commented 8 years ago

Yeah, signed overflow is one of the easiest ways to demonstrate undefined behavior, but it's usually justified in terms of optimization. See #4511 for an example where a function gets "optimized" to "always return 0" because the compiler assumes overflow won't happen (hey, at least it's faster; the compiler did what he was told to do).

The "larger than 8 bits" thing could be because, at least in C (not sure about C++), integer expressions smaller than int are promoted to int in some situations. (And by the way, don't think that intX_t protects against signed overflow UB; only unsigned types do.)

This reminds me, if someone could apply my pull request #4624, which would "fix" this undefined behavior, that would be great...

tigoe commented 8 years ago

@bperrybap My reasoning for why is explained earlier in this thread. I'm sorry if it's not clear enough, let me try to explain in more depth:

My experience in introducing people to programming for the first time has been that the simple arduino type names are easier to understand at first than the standard C types, because they are closer to everyday language.

I think it's well and good that later on students would learn the standard types if they expect to be enterprise programmers. Each teacher should sequence concepts in her own class in a way that works best for her pedagogical style. For me, it makes more sense to start with simpler type names when introducing what a data type is. Another teacher might prefer to introduce the standard data types when first introducing data types. If that works you your classroom, that's great. It probably means we teach people with different learning styles.

Arduino is open so that people can adapt it to their own needs. I have no interest in lock-in, and if people want to fork it for their own needs, I am happy to see that. There are some exemplary forks and add-ons out there; Teensyduino springs to mind. I believe the decision to distribute the source libraries uncompiled was also made to encourage adaptation, though you'd have to ask @damellis, as that decision was made before I joined the project. For the large majority of teachers I've spoken to, this makes it possible for them to adapt the system to their needs.

Ultimately every professional programmer needs to learn the styles and expectations of the projects and organizations for which she works. Adaptability is a key skill for programmers, I think. I would no sooner shove Kernighan & Ritchie down a student's throat than I would Strunk & White if I were teaching writing. But I'll introduce both as valuable resources when I think the student's ready for it.

If things change in the future, and the majority of teachers using Arduino start preferring standard types, and @damellis, @mbanzi, @dcuartielles @cmaglie et. al. decide to change the system to accommodate that, that's fine, I'll adapt the system for my needs, because it's designed for that. I already do; there are many examples on my class site that are not part of Arduino's core, because they're a little different than the standard examples.

I hope that helps you to understand my position. I don't expect you'll agree, but that's okay, we can have differing opinions.

(I can make suggestions as to how you could fork and adapt the IDE to meet your needs if you need them, but this response is already too long.)

lmihalkovic commented 8 years ago

On the flip side, there are professional developers in the real world who at times wonder where some newcomers in the profession picked up some of the questionable habits they display when coming fresh out of learning. Over the years I noticed two types of responses from educators faced with a knowledge gap: simplifying the setting until the gap becomes manageable, even if it means loosing some of its ability to depict a certain reality, or dealing with the issues.

shiftleftplusone commented 8 years ago

Re: " q2dg commented 2 days ago : I don't mind if the types must be Arduino-like or stdio-like but, please, make them consistent (in bit lenght and in sign): I would like that the meaning of "byte", "char", "int", "long", etc is the same whichever board I'm using."

Correct, trans-platform compatibility is 1 of the most important points of view!

Nevertheless, some objections:

byte is no C standard type char is ambiguous: if you compile a char type for a AVR platform, it will be automatically handles as a signed 8bit integer value (int8_t), but when you compile the same program to target a ARM platform, then it will be compiled as a unsigned 8 bit integer (uint8_t), and so you will get runtime issues - if not already compilation errors. So just the explicite type definition int8_t and uint8_t will be specific and exact. of course one may #define byte uint8_t.

About int and word there are also such issues: int is 16 bit on AVR, but 32 bit on ARM, so just int16_t and int32_t are specific. About word it's about the same. So if a AVR program uses an array of int int numbers[10]; it will consist of 20 bytes on AVR but of 40 bytes on ARMs, and if you want to read the 2nd int value you will have to start reading at byte 2 on AVRs but at byte 4 on ARMs. That is true also for ARMs on Arduinos (ARM Cortex M0, M3 on Zero, Due) and for ARMs used by Raspberry Pi and Beaglebone (ARM7, ARM9).

So just the explicite type definition (u)int16_t and (u)int32_t (and perhaps (u)int64_t) will be specific and exact.

This has to be considered by the data types used in Arduino libs.

bperrybap commented 8 years ago

It appears to me that in this discussion we have overlooked the issue being reported. It seems to have been lost in the all discussion over fixed sized types and whether to create new proprietary types for Arduino or whether to document types from that are currently in use. The issue brought up wasn't really about fixed sized types and what to call these types. Have a read again of the initial topic. The issue seem to be reporting that the existing documentation with respect to type usage does not always match the existing code in the API function implementation. Depending on the core, it becomes possible that the types in the documentation vs what is actually used in the code could be different enough that it is possible that your code would fail to operate correctly if you used the type in the documentation.

To me this is a big deal.

If an API says a function uses arguments of a given type or returns a given type, then that type should work and always work when used as documented.

Consider the millis() case cited in the original post. The API says it returns an unsigned long but this isn't the case for some of the 32 bit cores as they are returning a uint32_t instead of an unsigned long While returning a uin32_t ensures that the return value is a 32 bit value just like it was on the AVR where unsigned long is used, and a uint32_t should always fit into an unsigned long, there can be calculation issues when the users code treats the return value as an unsigned long and the size of unsigned long is not the same size as what is actually being returned.

This is because certain calculations depend on an overflow rollover in the calculation to hide a potential rollover in the return value. When the user's variable is not the same size as the return value, a calculation that depends on overflow/rollover will not work correctly.

I did do a check on the Teensy 3.x and chipkit (pic32) and in those cores a unsigned long is the same size as a uint32_t so there is no issue on that core. (not sure what DUE does) That said, my view is that even when uint32_t is the same size as unsigned long the underlying code should still be using unsigned long to avoid any possible confusion and any potential porting issues in the future.

However, the original issue, brings up an important point and that is that the API code implementation should be using the same type that is in the documentation to avoid confusion and to ensure that there is never a bit size compatibility issue.

In many if not most cases, using a natural type, something like unsigned long, is perfectly acceptable and avoids having to introduce any new size specific types.

But sometimes there are needs to ensure that the size of a variable is always a specific size and I think the discussion we've been having over when to do that and what to call them is actually a separate discussion from the original issue raised. (unless we want to lump them together)

The issue raised was that for calls like millis() and micros() is the return value really unsigned long or is it always a 32 bit value? i.e. on implementations that have 64 bit unsigned long should they still return a 32 bit value? If the return value must always be 32 bits, then the documentation should not state that the return value is unsigned long but instead something like uint32_t

And if the return value is an unsigned long then the underlying code should be using unsigned long rather than uint32_t even when they are the same size to avoid any potential confusion.

oqibidipo commented 8 years ago

The issue raised was that for calls like millis() and micros() is the return value really unsigned long or is it always a 32 bit value?

On Arduino 101 millis() and micros() return uint64_t; long is 32 bits.

bperrybap commented 8 years ago

On Arduino 101 millis() and micros() return uint64_t; long is 32 bits.

If true, THAT is a disaster! You will get overflow warnings with existing code that uses unsigned long and if you go modify your code to use uint64_t, the code will no longer work correctly on other platforms where unsigned long is not 64 bits.

That means there is no way to write sketch s/w that doesn't know what architecture it is running on, which to me totally defeats the point of having an abstraction API which is what Arduino is all about.

The API documentation is pretty clear that millis() and micros() return unsigned long while there are some cores that technically incorrectly used uint32_t at least in those implementations the size was the same as unsigned long and for the most part hidden from the user so it didn't matter to sketch/user since the API was consistent and worked correctly for all sketches that used unsigned long

The Arduino API functions need to be consistent across all platforms.

IMO, this is a big issue for 101 and needs to fixed ASAP before it gets in too wide spread use. And if uint64_t was used for these functions there may be other APIs that are also broken.

cousteaulecommandant commented 8 years ago

...well, unsigned types do not overflow, they wrap; so I would expect no warnings (unless you add -Wconversion, which is unusual). Anyway this is indeed a strange behavior; where is millis() for 101 declared? I only see avr/ and sam/ which declare it as unsigned long and uint32_t respectively; I didn't find any arc/ directory in this github.

arduino / Arduino

Documentation should use more specific types #4525