[sdk] Consider supporting popcnt, ctz & clz

modulovalue commented 1 year ago

Dart has no support for several operations for which efficient instructions appear to exist on most supported platforms.

The three that this issue focuses on are:

popcount: returns the number of 1s.
ctz: returns the number of trailing zeros.
clz: returns the number of leading zeros.

It is not uncommon to support these operations in standard libraries. GHC and C# are examples where that is the case.

For reference:

WASM supports all of them (popcnt, ctz, clz).
CRoaring contains a utility library for portable implementations of popcount, ctz and clz on native platforms.

Proposal

Add support for popcount, ctz & clz that exposes some optimal implementation on each supported platform.

Cf. https://github.com/dart-lang/sdk/issues/10212 Cf. https://github.com/dart-lang/sdk/issues/5798

mit-mit commented 1 year ago

cc @mraleph @lrhn

lrhn commented 1 year ago

One isssue with adding these to int is that they're not directly supported on the web. It has Math.clz32 (the "32" is fine, they'll be 32-bit operations like all other bit-operations on the web), but not Math.ctz or Math.popcount.

We can do ctz as

function ctz(n) {
  n |= 0;
  if (n == 0) return 32;
  return 31 - Math.clz32(n ^ (n - 1));
}

in JavaScript, and popcount probably also 32-bit in some fairly efficient way, but that's still not "one opcode".

It just doesn't feel like it really belongs on Dart int, which is (on the web) not really a normal n-bit two's complement number. But the other bit-operations are there, so that's probably a lost cause.

I'm also not sure ARM has a popcnt instruction. like intel x86/AMD x64, so it's not a single operation there either. (There's efficient code, maybe even short enough to inline, but not one instruction).

Since there'll be a difference between native (64-bit operatons) and web (32-bit operations with truncation), it's likely to be operations that are used in platform specific code in any case. But so is int.operator~.

We also already have bitLength, which is the same as "index of highest one bit" for positive integers, which is closely related to clz for positive numbers. The clz is pretty boring for negative numbers, because it's consistently zero.

Anyway, it's a possible addition. Even reasonable. I hate the names, but they are historical.

abstract class int {
   // ...

   /// Number of leading (most significant) zero-digits of the binary representation of this number.
   ///
   /// On the web, this value is first converted to a 32-bit integer by truncating to
   /// to the least signficant 32 bits in the binary representation of the number,
   /// and interpreting that as a signed 32-bit integer.
   /// On native, the actual 64-bit signed integer is used directly.
   ///
   /// The leading zero-bit count is always zero if the integer is negative.
   /// Otherwise, the leading zero bit count is the count of leading zeros
   /// the integer would have in base 2, if padded to the size of integer that the platform
   /// uses for bit operations (64-bit on native, 32-bit on the web).
   int get leadingZeroBitCount;

   /// Number of trailing (least significant) zero-digits of the binary representation of this number.
   ///
   /// On the web, this value is first converted to a 32-bit integer by truncating to
   /// to the least signficant 32 bits in the binary representation of the number,
   /// and interpreting that as a signed 32-bit integer.
   /// On native, the actual 64-bit signed integer is used directly.
   ///
   /// The trailing zero-bit count is position of the least signficant 
   /// 1-bit in the binary representation of the integer.
   /// If the integer is zero, the value is the size size of integer that the platform
   /// uses for bit operations (64-bit on native, 32-bit on the web).
   int get trailingZeroBitCount;

   /// The count of non-zero bits in the binary representation of this number.
   ///
   /// On the web, this value is first converted to a 32-bit integer by truncating to
   /// to the least signficant 32 bits in the binary representation of the number,
   /// and interpreting that as a signed 32-bit integer.
   /// On native, the actual 64-bit signed integer is used directly.
   ///
   /// the non-zero-bit count (aka one-bit count) is the number of "1" digits
   /// in the binary representation of that integer. A negative integer has
   /// one-digits up to the size size of integer that the platform
   /// uses for bit operations (64-bit on native, 32-bit on the web).
   /// The value of `n.nonZeroBitCount + (~n).nonZeroBitCount` is always the
   /// size the platform uses for bit operations (on the web, at least if the
   /// value starts out as a 32-bit integer).
   int get nonZeroBitCount;
}

modulovalue commented 1 year ago

I'm also not sure ARM has a popcnt instruction

Unless I misunderstood you, this article claims that popcount has existed in the ARM NEON extension since 2005 under VCNT. Pure ARM appears to just have a byte-based popcount instruction CNT

One isssue with adding these to int is that they're not directly supported on the web.

Maybe it would be useful to have something like a dart:native system library that is only supported by the wasm and vm compilation targets? I think it would be really unfortunate if users who need better performance would have to be limited by the quirks of js.

lrhn commented 1 year ago

As I read the ARM documentation (not an ARM expert!), I think the NEON VCNT instructions is also a per-byte SIMD instruction. It's probably a good first instruction to using a 64-bit popcount, then the rest can be done by a few shift+add operations, without needing extra masks.

mraleph commented 1 year ago

the rest can be done by a few shift+add operations, without needing extra masks.

I have checked what other compilers generate and learned that vcnt + uaddlv does the trick in two instructions (plus you need two moves in and out of vector registers)

dart-lang / sdk

[sdk] Consider supporting popcnt, ctz & clz #52673

Proposal