Redundant data types - Githubissues

cbnolok commented 8 years ago

Is there a point to have, for example, LONG, INT32, DWORD to point a long? This is valid also for INT, INT16, WORD. It's redundant and may be a bit confusing to anyone first approaching the code; aside this, is there a reason to keep it this way nowadays?

coruja747 commented 8 years ago

the type of the variable define its maximum value and if its signed/unsigned. Low values are more performance-friendly, so an item color (example) which only have ~2000 colors in multi.mul doesn't need to be set as long long, it will be useless and will waste CPU performance with no reason

max values are -char (int8): 127 -short (int16): 32767 -int (int32): 2147483647 -long: 2147483647L -long long (int64): 9223372036854775807i64

also some types are just shortcut to point another type: -BYTE (unsigned char = 1 byte) -WORD (unsigned short = 2 bytes) -DWORD (unsigned long = 4 bytes)

usually UO packets uses a lot of these things because packets is a sequence of many bits-sized values (1, 2 and 4 bytes, which is BYTE, WORD, DWORD) so in this case we use directly these types to make it easy to understand. But when it's not related to packets it doesnt need to be used with these names

also its used on many optimization cases that setting the correct type will preventing call unnecessary casts like this:

short GetValue()
{
  return 123;
}

void FunctionA()
{
  INT64 a = static_cast<INT64>(GetValue());
}

in this case the GetValue() is 'short' and must be converted to INT64 later. But if set GetValue() to already return as INT64, it won't need to call static_cast later

INT64 GetValue()
{
  return 123;
}

void FunctionA()
{
  INT64 a = GetValue();
}

of course on small C# / C++ apps there's no need to worry with things like this. But sphere is a server that call many functions billions of times, can have millions stored values, millions chars/items, etc, so it's better always optimize every single line/value to make the .exe always lightweight

cbnolok commented 8 years ago

I forgot that word and dword were unsigned, but aside this my question was: why do we have LONG and INT32 which are basically the same thing of long? Why don't we use long and remove LONG and INT32? Same thing for INT and INT16

lintax commented 8 years ago

I think most of those were due to historical reasons. Imagine, sphere started at about 2000, without even any means of portability, and support for different architectures.. It evolved with time with support of linux (and different compilers!) or 64bit architecture support. Technically we would not use, for example, size_t, but to the hell of c++, it could be smaller then long, or can be larger than unsigned long, so we need to stick to those damn excess types.

Btw, afaik, static_cast does not generate any code, it is done in compile time thus does not slow down anything a bit, so writing the way you like (using int/long) in simple scenarios are acceptable. The only thing, as pointed by Coruja, dealing with packets using BYTE, WORD, DWORD is highly preferable.

denizsokmen commented 8 years ago

Good thing is new types in C++11 make this stuff pretty easy, uint32_t int64_t etc. without the worry of portability or defining custom types to wrap them.

cbnolok commented 8 years ago

Okay so one could start for example to replace the size_t with another appropriate data type. That said, i don't understand if it's now necessary to have the other custom definied data types, besides WORD and DWORD, are they necessary now for cross-platform compilation? Are LONG, long and INT32 really the same thing? Should i prefer a typedef instead of another or can we start at least to use a default type for each size and gradually change the others?

denizsokmen commented 8 years ago

Most probably they are still here for historical reasons. They can still be refactored by a kind soul :)

well basically they can be replaced as BYTE = uint8_t (or signed I don't remember) WORD = uint16_t DWORD = uint32_t

long and int represent same amount of bits (32) but they are different types syntactically. LONG probably wraps long, I didn't check.

With the new standard it would be wiser to use standard (u)intX_t types. size_t completely depends on the implementation, starting from 16 bit to the maximum addressable index.

cbnolok commented 8 years ago

I didn't know that int size was different from c to c++. So, perhaps we can start to use only INT32 and drop long and LONG? Things with int are trickier at this point, since when refactoring we have to evaluate if each case requires a INT16 or INT32. Since sphere's now open source, it would be cool to have a guideline for these things, and also to gradually change old compatibility code to newer one. And, can sphere be compiled to x86_64 arch right now? I remember it's still x86, what's the problem, if there is any, to compile it to the other arch?

lintax commented 8 years ago

INT32 is not a replacement for long, because long has dynamic length depending on platform. On 64-bit it is 64bit, while INT32 is always 32bit. Insisting 32bit operations for server app is a bad way - imagine what if you need a larger heap size for your app? As for now we do not compile it for 64bit, it might require several fixes to work, but at least such possibility exists.

But since most code you write deal with stack variables only, it has no matter if you use dynamic-sized types like short/int/long or fixed ones like those deniz suggested (if your value fits within minimal range).

cbnolok commented 8 years ago

I guess i should give a look at c++ data types size... Anyways, my point is to use INT32 (fixed length) when i'm sure that my value fits its size (example, an UID). If i use a long to store this value, when running on a 64 bits OS i'll have a INT64, which is a waste (even if i'm not sure about how much memory will really be wasted, but i think it's equally worth it to save memory). If i know that the value fits the minimal range of a fixed data type, why shouldn't i use it? Be patient if i'm missing something ^^"

Returning to the issue's title, at this point we have a reason to keep BYTE, WORD, DWORD, INT8, INT16, INT32, but not SHORT, INT and LONG, which wrap short, int and long. If ever we will write a guideline, would we write "ehy guys, don't use SHORT, INT and LONG because they're useless and we plan to remove it?"

denizsokmen commented 8 years ago

long is not 64bits, long long is. Defining UID as uint32_t or int32_t is actually necessary for not wasting space as you said because of the fact in UO packets that object serials are ALWAYS 4 bytes no matter what and you store UID by specifying it 32 bits strictly (well a byte may not always be 8 bits but since we won't be running sphereserver on a proprietary embedded system with completely different architecture, nevermind that :P). As I said I haven't checked SHORT, INT, LONG maybe they differ from other types with their signedness. But in the end I wouldn't use them while there are types defined in the standard for various number of bits in . Good thing is, those new types are NOT defined for the platforms where 1 byte isn't 8 bits and you will have compilation errors.

There are lots of things I hate about the codebase but the project is too old to be judged and it's developed by many developers in the past without any standards. First thing I wish to change would be the scripting language to LuaJIT first :P

lintax commented 8 years ago

long might be 32bit and might be 64bit.. anything specified that it is never smaller than int, you are probably compiling using msvc) https://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_models

When you compile sphere to 32-bit, compiler might not do 64-bit alignment (however, it might), so your long would be 32bit regardless on the OS you are running.

In general, when memory matters, insisting lowest range might help, however it does not come for free, and addressing 32bit values on 64bit cpu works slower than addressing 64bit values (and all nowadays PCs are 64bit) regardless on the fact of running 32-bit or 64-bit os.

In brief, there is no a silver bullet to such a question, personally I prefer using standard types like int/long for most code unless you need to specify the exact size. And surely use the types that are already insisted by legacy sphere API) (and use size_t always when work with sizes)

cbnolok commented 8 years ago

Thank you, you were exhaustive. Then, the only thing i can suggest is to gradually replace SHORT, INT and LONG with short, int and long, since they're one the typedef of the other, what do you think? Also, last question: when you suggest to use long instead of int and vice-versa?

denizsokmen commented 8 years ago

I suggest you to use the exact space you need using with new types

http://www.cplusplus.com/reference/cstdint/

lintax commented 8 years ago

I would recommend for packets data leave BYTE/SHORT/WORD untouched since it is more obvious what is the size and this matters, and btw, writing BYTE is much shorter then writing unsigned char, why to replace (since char can be either signed and usigned by default depending on compiler flags)?

In generic code using simple types like int and double (which is de-facto bind to 4 and 8 bytes) is preferable unless you wish to specify exact size.

Regarding long - it should not be used at all (because it is only specified that it is not smaller than int), If you need something that will not fit in 32-bit int, then you will need to use either long long, or, preferable, any define, like those suggested by Deniz

cbnolok commented 8 years ago

Okay, this brings new questions :D 1) What about changing all long and LONG to int? This should need no more than a couple of minutes, i can do it. 2) Why do you want to keep SHORT? I agree with you for BYTE/WORD/DWORD but aren't WORD and SHORT/short both 2 bytes (with the latter being signed)? 3) Are operations with dynamic-sized variables faster than those on fixed-size vars? It may be a stupid question but i remember of having read something about that.

4) No one gave his opinion regarding dropping INT/LONG :) (i see there are also USHORT,UINT, ULONG, LONGLONG, ULONGLONG)

cbnolok commented 8 years ago

Adding a slightly off-topic question: since like 99% of enums contains values in the range of char, what do you think about strongly typing them to char, instead of leaving the default int type? Would this theoretically give a memory/performance improvement?

lintax commented 8 years ago

And after some time you will need to make posts like https://groups.google.com/forum/#!msg/alt.folklore.computers/mpjS-h4jpD8/9DW_VQVLzpkJ )

lintax commented 8 years ago

And once again, 2 facts

Decreasing variable size might save a memory if the data is kept for a long time and have tons of copies (like any object on server - this might save a bit - in case if structures that holds that data are configured for compressed storage - no alignment). This is the only case. Period.
Decreasing variable size either will not lead to anything at all (if compiled to use register instead of stack for storage, for example), in all other cases it would only decrease performance, since reading 2 bytes is a bit more costly then reading 8 on current computers. Period.

And if you have real concern of the memory usage, you can get much much much more profit if will serialize all offline characters and their items to some storage / json, not keeping them in memory unless they log in. This would, however, make more difficult to run migration scripts on end-user shard, but will greatly improve both the performance, and memory usage. Changing a couple of variables just makes a code more error-prone.

cbnolok commented 8 years ago

Got it, thanks!

cbnolok commented 8 years ago

Since we want to update old code, i suggest to use this issue to decide a standard for variable types. I propose to use the types below.

General

If such diversity is not needed, new variable types should be declared with typedef OR with define (maybe typedef is better).

Numbers

Everyone should know the max expected value he's filling a variable with, so imho it would be better to use fixed-sized variable types.
uint8_t, int8_t, 16, 32, 64, but encapsulated for simplicity into UCHAR, CHAR, USHORT, SHORT, UINT, INT, ULLONG, LLONG. This would be an easy transition since these variable types are already used somewhere. This means that everyone should use INT instead of int, CHAR instead or char and so on, in EVERY case (if someone doesn't bring a downside, and this should be decided and pointed out).
Drop the use of long, unsigned long, LONG and ULONG, since we already have ints for 32 bits numbers.
Drop INT16, INT32, INT64 and their unsigned counterparts. These are redundant.
Keep BYTE, WORD, DWORD but using them only when sending packets.
size_t: i have seen this type used inside send/receive.cpp to store values which the client accepted as WORD or DWORD, so in this case they can be changed to the more correct WORD and DWORD. Other suggestions about this data type? I'm not familiar with it.

Chars and strings

Drop the use of LPCSTR, LPSTR for char , const char \ or whatever, again i'm not familiar with these C++ types :D.

fjgo86 commented 8 years ago

And so, a new and unique file should be used to store them or a least a .h used on almost every situation ... since there's no point of including 300+ lines of code just to have access to this typedefs.

cbnolok commented 8 years ago

Hmm i'm still thinking about whats's the best naming for numeric variables. Maybe the best convention would be to name everything as u/int8,16,32,64, this has the advantage to use lowercase names and it's just clear to drop also BYTE, WORD, DWORD.

coruja747 commented 8 years ago

a important thing that we must care is about compiler and OS compatibility, because sometimes the variable type is not always the same every compiler / OS. A quick example is an commit that someone sent some days ago using UINT16_MAX where this def works fine on win (VS / MSbuild) but it doesn't exist on linux (GCC)

also 'int' is the same as 'long' but not always the same on 32bit / 64bit CPUs. I can't remember exactly, but 'int' is just a generic name for an generic number, which will be used as 'long' on 32bit CPUs and 'long long' on 64bit CPUs

cbnolok commented 8 years ago

That's why i suggest to use u/intX_t :) they are standard and definied in a c++ library i can't remember right now. You made me remember also that we need to leave only a _MAX define for each data type, it's confusing to have multiple of them, we can just use the one which is definied on every OS and compiler.

cbnolok commented 8 years ago

So, @lintax, what standard do you suggest to adopt for new code?

coruja747 commented 8 years ago

maybe its a good idea use the same RunUO / ServUO standard. Not because it's better/worse or fast/slow, but using the same code style will help many RunUO devs move to Sphere

cbnolok commented 8 years ago

What's their standard?

denizsokmen commented 8 years ago

std::numeric_limits::min or max() is what you need for those situations.

coruja747 commented 8 years ago

I can't remember exactly but I think it uses byte / char / short / int / long, which are already standard on every OS :P

on visual studio it's easy to get min/max values of all these types, just write an temporary INT_MAX / SHRT_MAX / etc at any random line, put your mouse over it and the mouse tooltip will display the value

lintax commented 8 years ago

I will not suggest any standard, using C standard types like int/short/char could be ok. But using uint32_t (being ugly itself, but shorter then unsigned int), is also a good replacement, so decide) ...unless you are dealing with packets.. With them I would stay with BYTE/WORD/DWORD since they are explicitly defined in therms of being unsigned and size.

cbnolok commented 8 years ago

What about using a uint typedef for unsigned int, ushort for unsigned short, uchar for unsigned char, llong for long long and ullong for unsigned long long? It's shorter and not so ugly imho. Also we should explicitly abolish the usage of long, because we already have int as 32 bits integer (and also because on c# long is 64 bits and this can be confusing for soneone?)

lintax commented 8 years ago

C++ also has 64-bit long, unless you are compiling for windows) but yes, proposal to use int and uint considering they are 32bit, char/uchar, short/ushort is pretty.. I doubt that you really need to specify length in most of your code.

cbnolok commented 8 years ago

So, can we state that the new standard (so just for new code, at the moment) is the following?

char/uchar, short/ushort, int/uint, llong/ullong;
byte, word, dword when dealing with packets (i can change them from uppercase to lowercase, just to have all the data types to the default lowercase, if everyone agrees).

Sphereserver / Source

Redundant data types #34