Closed a-v-o-dev closed 3 months ago
Assigning big values to small variables has a well-defined behavior and is not considered an error. The values are simply truncated to fit in the variable. It is programmer's responsibility to make sure numeric precision is not lost due to conversion.
a) This game is also thought for teaching programming ("It has been adapted for the specific purposes of COLOBOT and for an efficient pedagogical approach.", SatCom: Programming - the CBOT language), isn't it? Is "It is programmer's responsibility" an efficient pedagogical approach? One reason that (some) programming languages (e.g. free pascal) have introduced types is to give the compiler or the runtime engine the chance to do range checks and show approbiate error messages.
for (byte i = 0; i < 300; i++)
{
}
Do you see the error?
A student who doesn't see the error starts the program and waits and waits ... and wonders why the program doesn't end.
How can this student find out what's going wrong with his program? With the implemented debugging features he has to click over 1500 times (min. 6 times per loop) onto "1" ("One step") until he might realize that i
changes from 255 to 0. And if he doesn't realize it on the first run, he has to do again over 1500 clicks. And this is only byte
; how many clicks would be needed for short
?
It would need a debugging feature like a conditional breakpoint to make it the "programmer's responsibility".
b) Some more tests with the other integer types showed, that the problem seems to be how the message
instruction handles byte
:
long a = 64;
message (a); // displays 64
int b = 64;
message (b); // displays 64
short c = 64;
message (c); // displays 64
byte d = 64;
message (d); // displays @
d = 65;
message (d); // displays A
byte
seems to be treated as C++ char.
I tried the example loop from a) with a message
instruction:
for (byte i = 0; i < 300; i++)
{
message (i);
}
The game crashed with this error message:
Unhandled exception occurred!
==============================
Type: std::invalid_argument
Message: Unexpected UTF-8 continuation byte
==============================
BTW: Are all integer types in CBOT meant to be signed integers?
The scenario you describe in "a)" could be mitigated with in-game documentation for byte
type.
byte
seems to be treated as C++ char.
Yes, byte
is currently using signed char
as its underlying type.
BTW: Are all integer types in CBOT meant to be signed integers?
char
type in CBOT is using 32-bit unsigned integer.
Handling for that exception with try...catch should probably be implemented in Ui::CDisplayText
or something.
When you're using byte
you're making a decision to use a type that has only 8 bits of storage and is, therefore, limited to range of numbers between 0 and 255. If you don't know what byte
is or what valid usage for byte
is, you shouldn't use it. The game always had much safer int
and float
and that's what new CBOT programmers are introduced to. Other types were added quite recently and only for advanced programmers who know what they're doing. If you see byte
and use it without learning anything about it, it's your fault when your program fails.
Perhaps we can introduce compile-time warnings about narrowing conversions but we can't treat them as errors. And we can't catch integer overflow at runtime, it would be not only bad for performance but also bad for programmers who know what they're doing and want to use overflow as they'd normally do in other languages.
As for b), it's clearly a bug in message()
and it needs to be fixed.
Is the range of byte
in CBOT really meant to be 0
to 255
(I assumed that at first, too), or is it signed like the other integers (short
, int
and long
), so -128
to 127
? It seems to be implemented as signed char. I can't really check, because the game crashes even without the message
instruction when 128
is assigned to a byte
variable and I try to run it step by step. Probably the same method is used for message
and to display the contents of the variables while debugging ("One step").
Edit:
Well, now I've tested an empty loop:
for (byte i = 0; i < 150; i++) {}
Who would expect this to be an endless loop? It is.
So byte
is in CBOT implemented as a signed integer with range -128
to 127
. So the two of us, you and I, didn't know what byte
is, at least what it is in CBOT.
Using overflow with unsigned integers might often be used, but an internet search with "signed integer overflow" shows pages that say: Signed integer overflow is undefined behavior in C language. As it's undefined behavior, no programmer can know what he's doing, if he uses it.
You can use all 8 bits any way you like, even to store more than one value at the same time.
I should mention the undocumented >>> unsigned right shift operator was recently fixed.
if (your bool be[8];) { you byte; }
On the subject of undefined behavior, If byte
were unsigned the for
loops above would be considered as using undefined behavior with signed/unsigned comparison, but this isn't relevant for CBOT or how a byte is normally used.
I have no problem with an integer type with the range -128 to 127. This can be documented, so that everyone knows. Same with that there is no range check.
My proposal at this point of understanding:
Better not name this type byte
.
byte
is commonly known and in some other languages implemented as unsigned integer 0-255.
byte
is not yet official and therefore a change can easily be done at this point of time.
With a different name, CBOT is open to implement unsigned integers at some other point of time. This would be hard to do (many CBOT programs would have to be rewritten) after byte
is introduced as name for this signed integer type.
A student who just learned the difference between bit
, byte
, word
etc. wouldn't be confused about the different range. Even advanced users (who know what a byte
is, so that they think there's no need for them to read the documentation) wouldn't be surprised.
In SQL there is the integer type tinyint
for signed integers with the range -128 to 127.
Like short
as a short form of ShortInt
, this type could be named tiny
.
Someone who knows SQL would guess right, others who don't know SQL can't assume anything, but would be invited to read the documentation about this type.
Maybe there is a better name than tiny
.
C doesn't offer a name other than char
, which is already used in CBOT.
In Freepascal it is named Int8
or ShortInt
, but short
is already used in CBOT.
Java doesn't know unsigned integers, so there byte
is defined like in CBOT now.
In Visual Basic it is named SByte
.
Introducing byte
as name for this type might be the decision that there might never be unsigned integers in this language. The argument would always be "backward compatibility".
A programmer is provided with a group operators and containers for working with bytes and bits, if they're working at level of bytes and bits they don't have to care if it is signed or unsigned, the meaning each byte or bit is assigned by them. If they want to interpret a byte as signed or unsigned they can.
Experts can do with bits and bytes and unsigned and signed whatever they want, that's true, but that is not what I'm talking about. At first I wanted to understand if it was really meant to implement a signed 8-bit type and as far as I realized all CBOT integers are meant to be signed, so this implementation is consequent. So, for me this question is answered. At second I suggested to reduce confusion. Reminder: "and for an efficient pedagogical approach." (from my second post, respective from SatCom) Therefore my view is not what a programmer can do or not, but what introduces or minimizes confusion. At third I was thinking at the further development of the language and wanted to show that a future opportunity might get lost and that this could be avoided at this point of time.
I don't think anyone knows if byte
was meant to be signed or unsigned, and there can't be
much confusion after discovering that signed and unsigned bytes are exactly the same thing,
and that most single bytes of data don't represent any specific number or symbol. And it really
doesn't change how a byte is used, whether it's called signed or unsigned, or how one byte can
represent the values of many more bytes. If the data fits in a byte we can use any byte to store it.
A byte variable is almost always used as a container of bits, often just to store or move partial, arbitrary information as part of a group of bytes where one byte has no meaning by itself.
Any byte type can act as a little integer for doing math or iterating a range of values, but it usually doesn't matter what values or symbols a single byte represents because the value, symbol, or group of booleans can be chosen arbitrarily by the person using that single byte. A single byte or even a few bits can be used to help compress many more bytes of data, as done with LZSS and LEB128.
Maybe message() and variable list should show the binary representation, like 0b01001011
, if the argument given is byte
.
Just to understand:
int a = 65535; // bitmask: 0000 0000 0000 0000 1111 1111 1111 1111
message (a); // display: 65535
short b = a; // bitmask: 1111 1111 1111 1111 (bitmask preserved)
message (b); // display: -1 (sign and value changed)
int c = b; // bitmask: 1111 1111 1111 1111 1111 1111 1111 1111 (bitmask changed)
message (c); // display: -1 (sign and value preserved)
In one direction (large to small) the bitmask is preserved (only high bytes are omitted), sign and value might change.
In the opposite direction (small to large) the bitmask might change (expanding the sign bit), sign and value are preserved.
Is this the intended behavior?
(I used short
in this example, because of the problem with message
and byte
.)
If I understand right, then you see a byte
as a bit container, not as a number like the other integer types. The other integer types can be used as bit containers, too, and a byte
can also represent a number, so where's the difference?
I wouldn't handle byte
different from the other integers, so I'd show the numerical value (-128 .. 127). On the other hand a function to convert any of the integer types into a string that shows the bitmask could be useful, not only for byte
.
It should be pointed out that there's currently a lack of unit tests confirming correctness of various conversions and operations.
It looks like it's working correctly and using Two's Complement.
CBOT code for converting integers to string
:
extern void Test()
{
byte i = 0b100100; //= 36 = 0x24 = 0b00100100 = '$' = '\u0024';
message(toBits(i));
message(toHex(i));
}
string toBits(byte i)
{ // change 'byte' to any integer type
short nbits = sizeof(byte) * 8;
string str = "";
for (short s = 0; s < nbits; s++)
{
if (0 == s % 4 && s > 3) str = '_' + str;
str = (i & 0x01) + str; // get the low bit
i >>>= 1; // move all bits right by 1
}
return "b" + str;
}
string toHex(byte i)
{ // change 'byte' to any integer type
short bits = sizeof(byte) * 8;
string str = "";
for (short s = 0; s < bits; s += 4)
{
if (0 == s % 16 && s > 15) str = "_" + str;
char val = (i & 0x0F); // get the low 4 bits
val += (val < '\u000A') ? '0': 'A' - '\u000A';
str = val + str;
i >>>= 4; // move all bits right by 4
}
return "x" + str;
}
Displaying bytes (implicitly converted to chars?) still crashes colobot on dev
branch when loop reaches -128
, which is treated as character €
(Euro sign).
The test code:
for (byte i = 0; i < 300; i++)
{
message (i);
}
The call stack:
colobot.exe!_CxxThrowException(void * pExceptionObject, const _s__ThrowInfo * pThrowInfo) Line 75 C++
colobot.exe!StrUtils::Utf8CharSizeAt(const std::string & str, unsigned int pos) Line 195 C++
colobot.exe!Gfx::CText::StringToUTFCharList(const std::string & text, std::vector<Gfx::UTF8Char,std::allocator<Gfx::UTF8Char>> & chars) Line 1063 C++
colobot.exe!Gfx::CText::DrawString(const std::string & text, Gfx::FontType font, float size, Math::IntPoint pos, int width, int eol, Gfx::Color color) Line 1134 C++
colobot.exe!Gfx::CText::DrawText(const std::string & text, Gfx::FontType font, float size, Math::Point pos, float width, Gfx::TextAlign align, int eol, Gfx::Color color) Line 517 C++
colobot.exe!Ui::CLabel::Draw() Line 81 C++
colobot.exe!Ui::CWindow::Draw() Line 887 C++
...
The exception is thrown from StrUtils::Utf8CharSizeAt
:
...
// Invalid char - unexpected continuation byte
if (isUtf8ContinuationByte(c))
throw std::invalid_argument("Unexpected UTF-8 continuation byte");
...
A programmer is provided with a group operators and containers for working with bytes and bits, if they're working at level of bytes and bits they don't have to care if it is signed or unsigned, the meaning each byte or bit is assigned by them. If they want to interpret a byte as signed or unsigned they can.
Incorrect. CBot type byte
behaves as a signed integer (essentially just like int
but with a smaller range) when used in integer mathematical operations:
extern void New()
{
byte a = 255; // becomes -1
byte b = a / 2;
int i = b;
message(i); // prints 0
}
The crash was fixed by https://github.com/colobot/colobot/commit/9c53d4dc0a8f6287763976d7b93e179a5fd41909 (only available on the dev
branch so far)
...when used in integer mathematical operations...
It wasn't a comment about integer arithmetic using byte
.
It was a comment about the fact that you can interpret a byte as unsigned:
byte b = 0xFF; // -1
int i = b & 0b11111111;
message(""+i); // 255
Test case:
There is no compiler or runtime error.
message
just shows strange characters.This is probably also true for the other integer types, but it's not so obvious because numbers outside the range don't occur so often. Assigning a number outside the valid range should be detected already at compile time. Leaving the range in a calculation should produce a runtime error.