dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.21k stars 1.57k forks source link

Better standard types (typealiases + number types) #52250

Closed CXCubeHD closed 2 months ago

CXCubeHD commented 2 years ago

In C# I like that I can write standard types in lowercase like object or string. I think that Dart should also consider adding this.

object o = ...;
string s = "string better than String";

This also makes for the programmer clear which Types are standard and which not.

The second suggestion is that Dart should add more specific number types like short, long, byte and ect. These also should have the more specific version like Int8, Int32, UInt64, and ect.

short s = 110;
long l = -38264738463;
unsigned long ul = 10000000;
ulong ul2 = 57383747;
sbyte s = -25;

UInt16 u = 999;

float f = 1000f;
double d = 1000.0;

Float32 f2 = 193f32;
Float64 f3 = 1038f64;

This would allow for better interoperability and better readability. Also some developers may use this to write better performing code.

konsultaner commented 2 years ago

This would allow for better interoperability and better readability. Also some developers may use this to write better performing code.

I totally admit: https://github.com/dart-lang/sdk/issues/45153

If dart had UInt32, my code would work much faster. As @mraleph noted:

... Worth noting that Java does computations with 32-bit integers while Dart's int type is 64-bit (so it's more as if Java version was written with longs).

CXCubeHD commented 2 years ago

@konsultaner By the way, this also may be interesting for better performance: dart-lang/language#2101

munificent commented 2 years ago

In C# I like that I can write standard types in lowercase like object or string. I think that Dart should also consider adding this.

object o = ...;
string s = "string better than String";

This also makes for the programmer clear which Types are standard and which not.

I think it's very unlikely that we would add aliases for the built-in types that differ only in capitalization. That wouldn't add any clarity since both names would still exist.

I agree completely that the capitalization of Dart's built-in types makes no sense. But at this point, it is simply too much effort for too little reward to change it.

Having specific sized signed and unsigned types can be useful for performance, but it adds quite a lot of complexity to the language. Given that Dart isn't intended to be a systems programming language, I suspect it's not a good fit for the language. For serialization between binary formats, I think ByteData works pretty well.

konsultaner commented 2 years ago

Given that Dart isn't intended to be a systems programming language, I suspect it's not a good fit for the language. For serialization between binary formats, I think ByteData works pretty well.

@munificent I see the point, but look at the efforts that have been taken to make something like WASM just to make JavaScript faster. Darts first target plattforms were low performing ARM-CPUs on mobile phones. They need high performing code! Making dart code as fast as possible should be one of the most important things to do. Binding C++ code for everything that needs to be fast is not a good solution. Dart and Flutter are game changer. It should not stop here.

konsultaner commented 2 years ago

getInt8(int byteOffset) → int Returns the (possibly negative) integer represented by the byte at the specified byteOffset in this object, in two's complement binary representation. [...]

@munificent ByteData doesn't help since it returns 64bit int if you take for example an int8.

If Uint32List (and all the others) had some additional operations. That would work for me too.

Coming back to my original problem:

void encipher(UInt32List lr, int off)
  {
    int i;
    int n; // UInt32List n = UInt32List()
    var l = lr[off]; // var l = lr.subList(off)
    var r = lr[off + 1]; // var r = lr.subList(off + 1)
    l ^= P[0]; // l = l.xorAll(P[0])
    for ((i = 0); i <= (BLOWFISH_NUM_ROUNDS - 2); ) {
      n = S[(l >> 24) & 0xff]; // n = S.subList((l >> 24) & 0xff)
      n += S[0x100 | ((l >> 16) & 0xff)]; // n.add(S.subList(0x100 | ((l >> 16) & 0xff))
      n ^= S[0x200 | ((l >> 8) & 0xff)]; // n.add(S.subList(0x200 | ((l >> 8) & 0xff))
      n += S[0x300 | (l & 0xff)]; // n.add(S.subList(0x300 | (l & 0xff))
      r ^= n ^ P[++i]; // r = r.xor(n.xor(P.subList(++i)))
      /// ... and so on.
      n = S[(r >> 24) & 0xff];
      n += S[0x100 | ((r >> 16) & 0xff)];
      n ^= S[0x200 | ((r >> 8) & 0xff)];
      n += S[0x300 | (r & 0xff)];
      l ^= n ^ P[++i];
    }
    lr[off] = (r ^ P[BLOWFISH_NUM_ROUNDS + 1]);
    lr[off + 1] = l;
  }
lrhn commented 2 years ago

ByteData doesn't help since it returns 64bit int if you take for example an int8.

That's also what the CPU does. On a modern CPU, all registers are 64 bits. If you read a byte, it still stores it into a 64-bit register (sign- or zero-extended). What the CPU does have is 32-bit multiplication which doesn't overflow into the higher bits, so you don't need to mask afterwards to get rid of higher bits before doing a right-shift. Implicit truncation of intermediate results. I'm not sure that's worth introducing new integer types for.

(However, consider if we could do it the C++ way and give integer variables a bit-size:

int x : 4 = ...;
int y : 32u = ...;

Any value stored into x is truncated to 4 bits, or zero/sign-extended if already less than 4 bits. Any value stored into y is truncated to 32 bits, or zero/sign-extended if already less than 4 bits. Whether to zero- or sign-extend depends on whether the original was signed (like x) or unsigned (like y). Might even allow you to typedef it as typedef Uint32 = int:32u;. Would only work for integers, and the basic implementation would just do truncation to the bit-width + sign/zero extension to 64 bits on writing, reading gets you that value. Basically, it's implicit truncation and nothing more. Actually using fewer bits to store the value is an optimization.)

Levi-Lesches commented 2 years ago

Any value stored into x is truncated to 4 bits, or zero/sign-extended if already less than 4 bits.

So basically, implicit calls to x.toSigned(4) (or x.toUnsigned(4))?

lrhn commented 2 years ago

@Levi-Lesches Indeed. Then it might be possible to do some optimizations on top of that, like storing multiple integers into the same 64-bit memory slot, but effectively, semantically, it's like an automatic toSigned/toUnsigned when storing into the variable.

CXCubeHD commented 2 years ago

@munificent With the other things you said I agree.

Having specific sized signed and unsigned types can be useful for performance, but it adds quite a lot of complexity to the language. Given that Dart isn't intended to be a systems programming language, I suspect it's not a good fit for the language.

With this one I disagree and here is why: Dart is getting more and more popular and is getting higher presence in the dev community. If you say it like this then why not use stupid Scratch? And yes, I mean this stupid little program for kids. Why not use this?

No, but now really... - when the CPU offers this, then why not use it? Nowadays performance waste can be seen in most apps / games. Why not make the world better by just adding the option to write better code? Just because hardware gets better and better doesn't mean more performance should be wasted on stupid little things like this. Adding this would make bit operations much more useful and more performant, and not every variable needs so much space. Also, Dart is statically typed, not like this stupid little other languages, then why not make use of any small, but also very useful optimizations?

konsultaner commented 2 years ago

@mraleph wrote this in my original issue

I have done some prototyping to force smi operations into uint32 instead when they fit into the range and it does show that using uint32 (not surprisingly better). Inner loop goes from 90 instructions to 63 and runtime for tests goes from 21 seconds to 17 seconds (using an older version of the tests from yesterday).


That's also what the CPU does. On a modern CPU, all registers are 64 bits. If you read a byte, it still stores it into a 64-bit register (sign- or zero-extended).

@lrhn So the 64bit register size does not seem to be the issue? Does this mean that just optimizing my code to fully use 64bit should help? I'm not that experienced in this low level business. Could you explain the gap?

konsultaner commented 2 years ago

Also from @mraleph

The prototype as it is is unlikely to get merge, but yes I am planning to look (or to have somebody else on the team to look) into cleaning up how int arithmetic is handled in the backend. We are talking months if not quarters though before this really gets fixed because it's gonna be a huge refactoring and requires quite a bit of research to get right. Representation selection is not really a topic well covered in compiler research (because most of the compiler research just centers around statically typed languages with a zoo of differently sized integer types - so a problem of selecting most optimal integer representation does not exist there), so we are treading on an interesting territory here. We are long overdue for overhaul though - most of the code here is legacy from Dart 1 days when int type was arbitrary width and not a 64-bit integer.

Seems like it is not necessary to have uint32 but rather optimize how the dart compiler works!?

lrhn commented 2 years ago

What @mraleph talks about is the VM internal tagging of intermediate results, and avoiding that. While introducing a type like uint32 might make it easier for the VM to choose how to optimally represent intermediate values as untagged, it's still something the compiler could also do today (possibly guided by extra .toUnsigned(32) calls). Until it does that optimization, introducing uint32 won't change anything.

The difference between 8-bit, 32-bit and 64-bit arithmetic at the machine code level is mainly about the size of instructions. The 64-bit instructions typically (but not always) take one extra byte, the REX prefix byte. Other than that, there isn't much difference in speed between 8-bit, 32-bit and 64-bit operations. (16-bit operations have historically been slower on some processors, and doing a partial register load that isn't sign-extending or zero-extending is also slower.) See Agner Fog's instruction tables for far more information than you'll ever need. The one thing to say about machine code is that it's almost never trivial. For example, the Coffee Lake intel CPU has a MUL/IMUL operation where 32-bit multiplication is slower than 64-bit multiplication (page 299).

konsultaner commented 2 years ago

@lrhn Ok, I think got it. So it was kind of stupid to benchmark my code on the VM. I should have benchmarked it on the actual targets. i.e. the browser, android or linux, because they use native compiled code. It's just a matter of the VM not handling the intermediate values to the possible best.?

Reading all that, sound like introducing a uint32 doesn't make things better for the current supported targets.

mraleph commented 2 years ago

@lrhn note that not all of our targets are 64-bit, and 64-bit integer actually requires two registers on ARM. Then you need to have optimisation passes to shift from full 64-bit representation and operations (which require multiple instructions) to 32-bit when possible. We have passes like that - but they don't always work.

Having explicitly sized integer would make it simpler to generate good code on all targets without complicated optimization passes assuming that users use these explicitly sized integers. Notice I use the word simpler rather than possible. It is possible to generate better code with the current int type - we just did not get to fix all the issues we know about.

Another consideration here is memory representation of these values in fields. There might be use-cases where smaller integer fields (e.g. having 4 16-bit integers rather than 4 64-bit integers) would lead to better performance through better memory locality and cache utilisation.

Overall, I think there are clear benefits from having such types, but at the same time these benefits might be relatively niche.

konsultaner commented 2 years ago

Overall, I think there are clear benefits from having such types, but at the same time these benefits might be relatively niche.

I don't think thta performance optimization is a niche. There might not be a lot of developers using it, but a lot of developers using those niche-packages. encryption, graphics-libs, game engines, etc. I had that problem when implementing a login encryption. This just took too much performance. A lot of developers will port code from other languages to dart sooner or later and will request those compiler optimizations or new types (whatever is the best solution).

It is possible to generate better code with the current int type - we just did not get to fix all the issues we know about.

I bet fixing the compiler will affect more code. So if you ever find the time to fix these problems, I would be very grateful!

ChristianKleineidam commented 2 years ago

I don't think thta performance optimization is a niche.

Everyone wants better performance. Performance improvement 101 is that you do performance optimization based on measuring performance and not based on "x could potentially improve performance".

I had that problem when implementing a login encryption.

Implementing your own login encryption is something you should generally never do. Implementing crypto correctly is hard and it's much better to just use an existing C or Rust library that is well audited via FFI than to try to roll your own crypto.

konsultaner commented 2 years ago

I had that problem when implementing a login encryption.

Implementing your own login encryption is something you should generally never do. Implementing crypto correctly is hard and it's much better to just use an existing C or Rust library that is well audited via FFI than to try to roll your own crypto.

This is generally not false, but it was basically just a bcrypt hashing. I ususally use bouncy castle for critical things. The whole thing was part of a login process.

Everyone wants better performance. Performance improvement 101 is that you do performance optimization based on measuring performance and not based on "x could potentially improve performance".

This is exactly what I did and it resulted in an issue with the language itself.

insinfo commented 1 year ago

I think the UInt8, int8, Int16, int32, fp16, fp32 types among others will bring more flexibility and better performance (depending on the use case and target hardware) to dart. I understand this will bring more complexity and more work for the dart team who already don't have much bandwidth. But I think that the resource should be there so that the programmer can use it according to the specific use case he is trying to solve, besides that it makes it easier to translate codes from other languages ​​to dart.

"If you save four 16 bit integers in one 64 bit integer, you will of course have a bit overhead separating them. But the code can become much more cache friendly, which can impact performance to a quite large degree in some cases.

I once had an array of one million 32 bit ints and I treated them as Boolean variables. But when I changed so that each individual bit was a Boolean, then the array could be just 1/32 of the original size, which made the code MUCH faster.

It is correct that using the native word size may require less cpu instructions to do what you want, but this is not certain. The cpu may contain special instructions to deal with word fractions. But even if it really yields less cpu instructions, it still don't automatically transform into better performance.

Division is the major exception. 64bit integer division (128b/64b -> 64b) is slower even on current CPUs (esp. Intel's). Multiply is also different with different operand-sizes, esp. the one-operand N*N->2N bit form. e.g. Skylake:

It depends on the exact CPU and operation. On 64-bit Pentium IVs, for example, multiplication of 64-bit registers was quite a bit slower. Core 2 and later CPUs have been designed for 64-bit operation from the ground up.

Generally, even code written for a 64-bit platform uses 32-bit variables where values will fit in them. This isn't primarily because arithmetic is faster (on modern CPUs, it generally isn't) but because it uses less memory and memory bandwidth.

A structure containing a dozen integers will be half the size if those integers are 32-bit than if they are 64-bit. This means it will take half as many bytes to store, half as much space in the cache, and so on.

64-bit native registers and arithmetic are used where values may not fit into 32-bits. But the main performance benefits come from the extra general purpose registers available in the x86_64 instruction set. And of course, there are all the benefits that come from 64-bit pointers.

So the real answer is that it doesn't matter. Even if you use x86_64 mode, you can (and generally do) still use 32-bit arithmetic where it will do, and you get the benefits of larger pointers and more general purpose registers. When you use 64-bit native operations, it's because you need 64-bit operations, and you know they'll be faster than faking it with multiple 32-bit operations -- your only other choice. So the relative performance of 32-bit versus 64-bit registers should never be a deciding factor in any implementation decision."

lrhn commented 1 year ago

The arguments for smaller integer types here are mainly towards having many of them, and the typed-data lists like Uint8List already cover that.

For storing individual values, it could be possible for the VM to store multiple individual (unboxed) values in a single word. It takes a more complicated range analysis today to guarantee that a value won't ever take more than 16 bits.

I'm not sure it will help JavaScript compilation, since bit operations are limited to 32 bits anyway, but maybe they too can optimize some things if they actually know that values are only, say, 8 or 32 bits.

So, lets say we actually do add more integer types (to dart:typed_data, not dart:core). How would they behave?

So, something like

abstract final class FixedInteger {
  const FixedInteger();
  int get bitSize;
  bool get isSigned;

  Uint8 toUint8() => Uint8.fromInt(this.toInt());
  Int8 toInt8() => Int8.fromInt(this.toInt());;
  Uint16 toUint16() => Uint16.fromInt(this.toInt());;
  Int16 toInt16() => Int16.fromInt(this.toInt());;
  Uint32 toUint32() => Uint32.fromInt(this.toInt());;
  Int32 toInt32() => Int32.fromInt(this.toInt());;

  int toInt(); 
}

abstract final class FixedSignedInteger extends FixedInteger {
  const FixedSignedInteger();
  bool get isSigned => true;
}  
abstract final class FixedUnsignedInteger extends FixedInteger {
  const FixedUnsignedInteger();
  bool get isSigned => false;
}  
final class Int8 extends FixedSignedInteger {
  external const Int8.fromInt(int);
  int get bitSize => 8;
  // All the operators:
  Int8 operator+(int8 other) => Int8.fromInt(this.toInt() + other.toInt()); // TODO: Optimize
  // etc, for unary-, , -, ~/, %, div, mod, >>, >>>, <<, |, &, ^, ~, >, >=, <, <=.
  // May also:
  (Int8, Int8) divMod(Int8 other) => ...;
  (Int8, Int8) fracRem(Int8 ohter) => ...;

  int get sign => this.toInt().sign;

  String toString() => this.toInt().toString();

  static Int8 parse(String input) => ...
  static Int8? tryParse(String input) => ...
}
final class Uint8 extends FixedUnsignedInteger {
  int get bitSize => 8;
  /// Roughly same as Int8, except no `>>>` or `sign`.
  /// but `toHexString` and `parseHexString` which is only provided for unsigned types.
}
// etc for (Uint/Int)(16,32). No Int64, can't do that on the web anyway, and just use `int` on native.
//
// Or maybe do have Int64 and Uint64 instead of relying on FixNum for those, and have *some* implementation for web.

int _div(int p, int q) { // Division corresponding to `%`, so that `_div(p, q) * q + (p % q) == p`.
  var d = p ~/ q;
  if (p < 0 && d * q != p) d -= d.sign;
  return d;
}

No operators on shared superclasses, no implicit coercions, or allowing you to do int8 + int16 and get something predictable from that.

gintominto5329 commented 1 year ago

hello,

I think following strikes the correct balance, between complexity and functionality:

  typedef u8 = int;
  typedef u16 = int;
  typedef u32 = int;
  typedef i64 = int;

See also: www.github/dart-code

read:

On a modern CPU, all registers are 64 bits. If you read a byte, it still stores it into a 64-bit register (sign- or zero-extended).

if(disAgree) goto read; // ; )

One thing I would prefer over all this, would be increasing int's 64 bit limit to 128 bit, but keeping the name same;

thanks

insinfo commented 1 year ago

@gintominto5329 what you put doesn't make sense, and it's not what the devs want here in this issue

@lrhn the implementation you suggested looks good to me, personally I don't care if it's going to be in "dart:typed_data" and not in "dart:core", but I was curious why not put it in "dart:core". I also don't think it's a requirement to implement "num', but I'm also curious why this is an issue on the web, it seems the Kotlin team was able to compile the various Int and Float types to javascript without any issues.

this compiles without any problems for javascript in kotlin
fun main() {
    var one: Byte = 127 // Byte
    var threeBillion: Int = 2147483647 // Int
    var oneLong: Long = 9223372036854775807 // Long
    var oneByte: Short = 32767 // Short
    var e: Double = 2.718281812456789 // Double
    var eFloat: Float = 2.1234567f // Float
    println("int types, Double: $e Float: $eFloat Byte: $one")
}
gintominto5329 commented 1 year ago

Hello @insinfo,

May I ask, how can we, be so much sure, that the following:

fun main() {
    var one: Byte = 127 // Byte
    println("int types, Byte: $one")
}

is NOT same as the following, behind the scenes:

// NOTE: this, is pseudo-code, in dart
class Byte extends Int{
  int _value;
  set value(int newValue) {
    _value = ((newValue < 128) ? newValue : (Uint8List(1)..[0]=newValue)[0]);
  }
}

Just a bit more details hiding, to make a fool of the dev,

the tone might be in-appropriate, but please read fully, carefully:

On a modern CPU, all registers are 64 bits. If you read a byte, it still stores it into a 64-bit register

Adding these, won't make dart, faster, just a bit bulky. And if more and more of these abstraction are added, then remove "a bit " from the previous sentence

thanks,

Cryvage commented 2 months ago

What I was thinking about is a more general approach. Instead of adding a bunch of numeric types, why wouldn't add a way to implement ANY numeric and other similar type? I'm talking about C# structs analog. For now, the main problem of implementing types like decimal, complex or vector3d is that non primitive types are allocated on heap and not on stack, which means you can implement them but their performance will suck big time. Seriously, if we are talking about performance, then the main problem for performance is not the size of number, it's GARBAGE COLLECTOR. The only way to make garbage collection performant is to avoid garbage collection anywhere possible. Following this approach is what makes Go so performant. Having structs is what makes C# a thing in gaming development. And of course, with this feature you can implement a number type of any size and any specific logic.

insinfo commented 2 months ago

@Cryvage I agree with you, this would be very good, it would greatly improve the power and flexibility of the dart.

insinfo commented 2 months ago

@CXCubeHD why did you close this?