dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.27k stars 4.73k forks source link

Proposal: Expose Bit Manipulation functions #27382

Open grant-d opened 6 years ago

grant-d commented 6 years ago

Bit manipulation routines are common enough that we should expose a subset as platform primitives. While some of them may be simple to write, it's much harder to achieve the requisite performance desired, especially since they tend to be used within tight loops.

The aim of this proposal is to scope & design a minimal set of functions, to be implemented with a bias towards performance. Per @tannergooding

The point of these APIs is to provide a general-purpose API that works on all platforms (which means providing a software fallback) and is generally-usable. Hardware Intrinsics are for performance oriented scenarios where you require hardware acceleration and need more direct control of the code that is emitted.

Note that even though some of the formula may be simple, relevant callsites are more self-documenting when using the intrinsics (should the dev choose to use them).

Scope

Rationale and Usage

The proposed functions are already implemented throughout the stack, often with different algorithms, performance characteristics and test coverage. Existing callsites below: https://github.com/dotnet/corefx/issues/32269#issuecomment-457689128 (There is likely to be more; the initial search was timeboxed to ~1 hour)

Some of the implementation have suboptimal performance or bugs. Something like ExtractBit is trivial to implement, but PopCount is more complex and thus prone to logic and performance issues. Hiding these complex formulae behind friendly signatures makes using them more approachable.

Here's an example of a function (BTC) whose signature is simple but the algebra is easy to get wrong.

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool ComplementBit(ref uint value, int bitOffset)
{
    uint mask = 1u << bitOffset;
    bool btc = (value & mask) != 0;

    value = ~(~mask ^ value);

    return btc;
}

However making a call to it meets our goal of abstraction and performance:

uint value = 123;
bool previouslyTrue = BitOperations.ComplementBit(ref value, 6);

Proposed API

The proposed API is purposefully kept lean. We can add more methods in later design iterations. We should view this as an opportunity to get simple, base functionality out the door and not stray into the dangerous territory of adding every bit twiddling hack that exists.

Assume all methods are decorated with [MethodImpl(MethodImplOptions.AggressiveInlining)]

public static class BitOperations
{
    // BT
    bool ExtractBit(byte value, int bitOffset); // Could name this BitTest or TestBit
    bool ExtractBit(uint value, int bitOffset);
    bool ExtractBit(int value, int bitOffset);

    // BTS (scalar)
    byte InsertBit(byte value, int bitOffset); // BitSet or SetBit
    uint InsertBit(uint value, int bitOffset);
    int InsertBit(int value, int bitOffset);

    // True BTS (returns original value)
    bool InsertBit(ref byte value, int bitOffset);
    bool InsertBit(ref uint value, int bitOffset);
    bool InsertBit(ref int value, int bitOffset);

    // BTR
    byte ClearBit(byte value, int bitOffset); // BitReset or ResetBit
    uint ClearBit(uint value, int bitOffset);
    int ClearBit(int value, int bitOffset);

    bool ClearBit(ref byte value, int bitOffset);
    bool ClearBit(ref uint value, int bitOffset);
    bool ClearBit(ref int value, int bitOffset);

    // BTC
    byte ComplementBit(byte value, int bitOffset);
    uint ComplementBit(uint value, int bitOffset);
    int ComplementBit(int value, int bitOffset);

    bool ComplementBit(ref byte value, int bitOffset);
    bool ComplementBit(ref uint value, int bitOffset);
    bool ComplementBit(ref int value, int bitOffset);

    // on ? BTS : BTR
    byte WriteBit(byte value, int bitOffset, bool on);
    uint WriteBit(uint value, int bitOffset, bool on);
    int WriteBit(int value, int bitOffset, bool on);

    bool WriteBit(ref byte value, int bitOffset, bool on);
    bool WriteBit(ref uint value, int bitOffset, bool on);
    bool WriteBit(ref int value, int bitOffset, bool on);
}

Details

Questions

Decisions

Sample call sites

The following samples are taken from the linked units, from the method BitOps_Samples. The code chooses values that are easy to eyeball for correctness. The real units cover many more boundaries & conditions.

// ExtractBit: Reads whether the specified bit in a mask is set.
Assert.True(BitOps.ExtractBit((byte)0b0001_0000, 4));
Assert.False(BitOps.ExtractBit((byte)0b0001_0000, 7));

// InsertBit: Sets the specified bit in a mask and returns the new value.
byte dest = 0b0000_1001;
Assert.Equal(0b0010_1001, BitOps.InsertBit(dest, 5));

// InsertBit(ref): Sets the specified bit in a mask and returns whether it was originally set.
Assert.False(BitOps.InsertBit(ref dest, 5));
Assert.Equal(0b0010_1001, dest);

// ClearBit: Clears the specified bit in a mask and returns the new value.
dest = 0b0000_1001;
Assert.Equal(0b0000_0001, BitOps.ClearBit(dest, 3));
// ClearBit(ref): Clears the specified bit in a mask and returns whether it was originally set.
Assert.True(BitOps.ClearBit(ref dest, 3)); 
Assert.Equal(0b0000_0001, dest);

// ComplementBit: Complements the specified bit in a mask and returns the new value.
dest = 0b0000_1001;
Assert.Equal(0b0000_0001, BitOps.ComplementBit(dest, 3));
// ComplementBit(ref): Complements the specified bit in a mask and returns whether it was originally set.
Assert.True(BitOps.ComplementBit(ref dest, 3));
Assert.Equal(0b0000_0001, dest);

// WriteBit: Writes the specified bit in a mask and returns the new value. Does not branch.
dest = 0b0000_1001;
Assert.Equal(0b0000_0001, BitOps.WriteBit(dest, 3, on: false));
// WriteBit(ref): Writes the specified bit in a mask and returns whether it was originally set. Does not branch.
Assert.True(BitOps.WriteBit(ref dest, 3, on: false));
Assert.Equal(0b0000_0001, dest);

Updates

tannergooding commented 6 years ago

I would say that we leave the APIs as is (all called Rotate) and that it can be discussed further during the design-review.

Changing the signature of an API (from uint to ulong, for example) can naturally cause downstream breaks and BitOps would be, by far, not the only thing that has a potential for different behavior.

grant-d commented 6 years ago

OK, I will leave the signatures as-is.

Changing the signature of an API (from uint to ulong, for example) can naturally cause downstream breaks and BitOps would be, by far, not the only thing that has a potential for different behavior.

Oops, bad example. My concern is with implicit integer upcasts. Changed my code sample above to use byte instead of uint, since it's less of a problem between uint and ulong (for which I agree with your statement that's it's an established paradigm)

grant-d commented 6 years ago

FYI. Following is the POC implementation of Iff (was called Evaluate). It's already used in internal code; it may be useful to expose as public?

[Edited]

// Converts a boolean to a byte value without branching.
// Assume AggressiveInlining
public static byte AsByte(ref bool condition)
{
    int val = Unsafe.As<bool, byte>(ref condition); // CLR permits 0..255

    // Normalize bool's underlying value to 0|1
    val = -val; // Negation will set sign-bit iff non-zero
    val >>= 31; // Send sign-bit to lsb (all other bits will be thus zero'd)

    return (byte)val; // 0|1
}

public static uint Iff(bool condition, uint trueValue)
{
    uint val = AsByte(ref condition);
    return val * trueValue;

    // FYI this is slower
    //uint mask = AsByte(ref condition) - 1u; // 0x00000000 | 0xFFFFFFFF
    //return ~mask & trueValue;
}

public static uint Iff(bool condition, uint trueValue, uint falseValue)
{
    uint val = AsByte(ref condition);
    return (val * trueValue) + (1 - val) * falseValue;
}

// For brevity, int|long|ulong overloads not shown

Usage:

var foo = BitOps.Iff(n > 0, 33); // Returns 33 if true, else 0

Example of internal usage:

public static ushort WriteBit(ushort value, int bitOffset, bool on)
{
    int shft = bitOffset & 15;
    uint mask = 1U << shft;

    uint onn = AsByte(ref on); // <-- Note, need to guarantee this is 0|1
    onn <<= shft;

    return (ushort)((value & ~mask) | onn);
}

Benchmark results (post bool-normalization change)

      Method |     Mean |     Error |    StdDev |   Median | Scaled |
------------ |---------:|----------:|----------:|---------:|-------:|
      Branch | 4.449 ns | 0.0890 ns | 0.2297 ns | 4.367 ns |   0.98 | x
  UnsafeCode | 5.645 ns | 0.0117 ns | 0.0091 ns | 5.645 ns |   1.24 |
    UnsafeAs | 4.552 ns | 0.0186 ns | 0.0156 ns | 4.552 ns |   1.00 | x Same perf as Branch
 UnsafeAsRef | 2.529 ns | 0.0157 ns | 0.0131 ns | 2.522 ns |   0.56 | xx Looking at this too
 UnionStruct | 5.646 ns | 0.0155 ns | 0.0137 ns | 5.644 ns |   1.24 |

Benchmark results (pre bool-normalization change)

      Method |     Mean |     Error |    StdDev | Scaled |
------------ |---------:|----------:|----------:|-------:|
      Branch | 3.923 ns | 0.0884 ns | 0.1322 ns |   1.50 | !
  UnsafeCode | 3.431 ns | 0.0358 ns | 0.0334 ns |   1.31 |
    UnsafeAs | 2.629 ns | 0.1356 ns | 0.1392 ns |   1.00 | x Much faster than Branch
 UnsafeAsRef | 2.329 ns | 0.0178 ns | 0.0149 ns |   0.89 | xx
 UnionStruct | 3.505 ns | 0.1408 ns | 0.1446 ns |   1.34 |
GSPP commented 6 years ago

@grant-d this If would not support boolean values bigger than 1. Are they simply not supported? They are legal on the CLR. The C# specification ignores their existence.

One could fix it by doing !!condition with unclear performance cost.

If could also be:

var mask = (uint)AsByte(ref condition) - 1; //0x00000000 or 0xFFFFFFFF;
return (mask & falseValue) | (~mask & trueValue);
jhudsoncedaron commented 6 years ago

@GSPP: If you stuff a 2 into a bool, either the C# compiler or the Jitter (not sure which) generates incorrect code for the xor operator or select statement on bools. Ergo bool must be either 0 or 1.

Linky: https://stackoverflow.com/questions/48344107/create-bool-value-which-is-neither-true-or-false

tannergooding commented 6 years ago

The runtime (which includes the JIT) specifies that a bool 1-byte and is true (any non-zero bit pattern) or false (a zero bit pattern). You can see III.1.1.2 in ECMA 335 for more info.

The C# language specifies that a bool is 1-byte and is true (1) or false (0). You can see https://github.com/dotnet/roslyn/issues/24652 for more information.

grant-d commented 6 years ago

OK, I have updated the code sample above. AsByte now normalizes the bool's underlying byte value using the fastest routine I could think of. Maybe there's an intrinsic for that.

Unfortunately that extra work means that it's about the same performance as branching now. Not sure if that still makes it worth doing, or if we should just incur the actual branch.

@tannergooding I am not sure what the C# contractual obligations are w.r.t. interop with other languages/platforms. Meaning the normalization code may or may not be required?

jhudsoncedaron commented 6 years ago

There's a faster way to normalize if you can have a JIT intrinsic.

or al, al
cmovnz al, 1

cmov isn't a branch and won't stall the pipeline.

grant-d commented 6 years ago

@tannergooding, any movement on getting this approved?

tannergooding commented 6 years ago

This was briefly discussed in the last API review meeting and it was determined that we want to have a more in-depth discussion. We also want to have some more real-world scenarios and example usages for these APIs listed.

grant-d commented 6 years ago

Example usages

Sure, would units suffice? My fork is close to 100% covered.

Real world usages

Based on your previous comment (below), and since I picked this up from up-for-grabs, I expected this was already well-known?

This is still a fairly large proposal and covers a number of items that we currently know have use-cases both in and outside the framework.

tannergooding commented 6 years ago

I expected this was already well-known?

Yes, it is just a matter of ensuring the information is collected and presented as part of the API review.

grant-d commented 6 years ago

OK. Here are the implementation & units:

BitOps.cs (in corefx, I expect it will be in coreclr in actual PR) BitOpsTests.cs (in the wrong folder, will fix in actual PR) - see first method BitOps_Samples for TLDR

grant-d commented 6 years ago

Added sample callsites to spec. @tannergooding will you be handling the evidence requirements?

grant-d commented 5 years ago

I noticed a couple of method names were inconsistent in the spec vs discussion. Fixed.

tannergooding commented 5 years ago

will you be handling the evidence requirements?

It is on my backlog, but there are some higher priority items on my plate that need to be handled first.

grant-d commented 5 years ago

@tannergooding / @jhudsoncedaron, I have a design question.

For functions like ExtractBit which ostensibly return a bool telling you whether the bit is set or not, would it be more useful in general to return a byte containing 1|0? The caller can easily convert that to a bool, but not the other way around. And a byte is more easily composable with other twiddling functions, without having to 'de-bool' it.

And for methods like PopCount, LeadingZeros, etc maybe a return type of byte would be better than int, since the caller can then use any receiving int type/width they prefer?

jhudsoncedaron commented 5 years ago

@grant-d : All my examples would be ExtractBit() wanting to return a bool and SetBit() wanting to pass a bool. The only other functions I really care about are RotateLeft() and RotateRight() which I ended up having to write myself as extension methods because they're easy to get wrong.

tannergooding commented 5 years ago

Taking/returning a bool is more desirable and will likely be better handled by the JIT (as, I think, will result ? 1 : 0).

grant-d commented 5 years ago

Updated POC code to call into intrinsics where possible, eg

        public static byte PopCount(uint value)
        {
            if (System.Runtime.Intrinsics.X86.Popcnt.IsSupported)
            {
                return (byte)System.Runtime.Intrinsics.X86.Popcnt.PopCount(value);
            }

            // Software fallback...
        }

By the time (if) this ships, hopefully the instrinsics code will be non-preview. Also see upcoming changes in: https://github.com/dotnet/coreclr/issues/20062

danmoseley commented 5 years ago

@jkotas pointed out that https://github.com/dotnet/coreclr/issues/21583 was a perf scenario that would be improved by having an intrinsic rotate, ie, something as proposed in this issue, plus JIT work to recognize it.

grant-d commented 5 years ago

Note the naming convention used in the related PR, which should be considered in this issue. TrailingZeroCount is used there, vs TrailingZeros proposed here. https://github.com/dotnet/coreclr/pull/22118

namespace System
{
    internal static class BitOps // Note internal
    {
        public static int TrailingZeroCount(int matches)
grant-d commented 5 years ago

@tannergooding is it not possible to stage this PR: just justify/merge the functions that don't need much justification, such as the rotates, and perhaps the balance of the trailing/leading overloads?

grant-d commented 5 years ago

See new use case for LZCNT in https://github.com/dotnet/coreclr/pull/22100 Also https://github.com/dotnet/coreclr/pull/19006

GSPP commented 5 years ago

@grant-d Is that the right issue? It seems unrelated.

grant-d commented 5 years ago

@GSPP fixed, thanks

tannergooding commented 5 years ago

is it not possible to stage this PR: just justify/merge the functions that don't need much justification, such as the rotates, and perhaps the balance of the trailing/leading overloads?

Not really, in all cases the APIs must be reviewed and approved before being made part of the public surface area. There is a chance that some smaller subsection of the APIs could be approved before the rest; but that still requires the same work to be done.

If someone wanted to help push this along, they could find the existing usages of proposed APIs above and move them to the currently internal BitOps class. This would help identify all the places we are currently using these APIs and how they are being used.

grant-d commented 5 years ago

Motivation and duplicate implementations of the proposed methods. Initial search was timeboxed to 1 hour, so there is likely to be more.

OTHER Found quite a few more callsites whilst implementing the PR. See the code for more details.

POPCNT https://github.com/dotnet/corefx/blob/master/src/System.Reflection.Metadata/src/System/Reflection/Internal/Utilities/BitArithmetic.cs https://github.com/dotnet/corert/blob/cf5dc501e870b1efe8cba3d6990752538d174773/src/System.Private.CoreLib/shared/System/Buffers/Text/FormattingHelpers.CountDigits.cs https://github.com/dotnet/roslyn/blob/367e08d8f9af968584d5bab84756eceda1587bd9/src/Workspaces/Core/Portable/Utilities/CompilerUtilities/ImmutableHashMap.cs#L1002 https://github.com/dotnet/coreclr/blob/030a3ea9b8dbeae89c90d34441d4d9a1cf4a7de6/tests/src/JIT/Performance/CodeQuality/V8/Crypto/Crypto.cs#L1277

LZCNT https://github.com/dotnet/corefx/blob/62c70143cfbb08bbf03b5b8aad60c2add84a0d9e/src/Common/src/CoreLib/System/Number.BigInteger.cs#L700 https://github.com/dotnet/coreclr/pull/22100 https://github.com/dotnet/coreclr/pull/19006 https://github.com/dotnet/corefx/blob/151a6065fa8184feb1ac4a55c89752342ab7c3bb/src/Common/src/CoreLib/System/Decimal.DecCalc.cs#L728 https://github.com/dotnet/coreclr/blob/030a3ea9b8dbeae89c90d34441d4d9a1cf4a7de6/tests/src/JIT/Performance/CodeQuality/V8/Crypto/Crypto.cs#L1254 (LeadingOnes)

TZCNT https://github.com/dotnet/coreclr/pull/22118 https://github.com/dotnet/corert/blob/db8723b1787e18f0124e94c387542d0ef3aac128/src/System.Private.CoreLib/shared/System/BitOps.cs#L16

ROTL/ROTR https://github.com/dotnet/coreclr/issues/21583 https://github.com/dotnet/coreclr/pull/1830 https://github.com/dotnet/corert/blob/87e58839d6629b5f90777f886a2f52d7a99c076f/src/System.Private.CoreLib/src/System/Marvin.cs#L120-L124 https://github.com/dotnet/coreclr/issues/1619 https://github.com/dotnet/roslyn/blob/367e08d8f9af968584d5bab84756eceda1587bd9/src/Workspaces/Core/Portable/Utilities/CompilerUtilities/ImmutableHashMap.cs#L972 https://github.com/dotnet/corert/blob/1bb85b989d9b01563df9eb83dce1c6a3415ce182/src/ILCompiler.ReadyToRun/src/Compiler/ReadyToRunHashCode.cs#L214 https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/Common/src/Internal/NativeFormat/TypeHashingAlgorithms.cs#L20 https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/Common/src/Internal/Text/Utf8String.cs#L47 https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/System.Private.CoreLib/src/Internal/Runtime/CompilerServices/OpenMethodResolver.cs#L178 https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/System.Private.CoreLib/src/System/RuntimeMethodHandle.cs#L67 https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/ILCompiler.Compiler/src/Compiler/DependencyAnalysis/NodeFactory.NativeLayout.cs#L345 https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/System.Private.TypeLoader/src/Internal/Runtime/TypeLoader/TypeLoaderEnvironment.NamedTypeLookup.cs#L121 https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/System.Private.CoreLib/src/Internal/Runtime/CompilerServices/OpenMethodResolver.cs#L178 https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/System.Private.CoreLib/src/System/RuntimeFieldHandle.cs#L47 https://github.com/dotnet/corert/blob/635cf21aca11265ded9d78d216424bd609c052f5/src/System.Private.TypeLoader/src/Internal/Runtime/TypeLoader/TypeLoaderEnvironment.NamedTypeLookup.cs#L121

InsertBit https://github.com/dotnet/roslyn/blob/367e08d8f9af968584d5bab84756eceda1587bd9/src/Workspaces/Core/Portable/Utilities/CompilerUtilities/ImmutableHashMap.cs#L1013 https://github.com/dotnet/corefx/blob/bd414c68872c4e4c6e8b1a585675a8383b3a9555/src/System.DirectoryServices.AccountManagement/src/System/DirectoryServices/AccountManagement/Utils.cs#L51 https://github.com/dotnet/iot/blob/93f2bd3f2a4d64528ca97a8da09fe0bfe42d648f/src/devices/Mcp23xxx/Mcp23xxx.cs#L51 https://github.com/dotnet/wpf/blob/2cbb1ad9759c32dc575c7537057a29ee7da2e1b2/src/Microsoft.DotNet.Wpf/src/System.Xaml/System/Xaml/Schema/Reflector.cs#L363

ClearBit https://github.com/dotnet/corefx/blob/bd414c68872c4e4c6e8b1a585675a8383b3a9555/src/System.DirectoryServices.AccountManagement/src/System/DirectoryServices/AccountManagement/Utils.cs#L46 https://github.com/dotnet/coreclr/blob/030a3ea9b8dbeae89c90d34441d4d9a1cf4a7de6/tests/src/JIT/Performance/CodeQuality/V8/Crypto/Crypto.cs#L1314 https://github.com/dotnet/iot/blob/93f2bd3f2a4d64528ca97a8da09fe0bfe42d648f/src/devices/Mcp23xxx/Mcp23xxx.cs#L45 https://github.com/dotnet/roslyn/blob/367e08d8f9af968584d5bab84756eceda1587bd9/src/Workspaces/Core/Portable/Utilities/CompilerUtilities/ImmutableHashMap.cs#L1020

ExtractBit https://github.com/dotnet/iot/blob/93f2bd3f2a4d64528ca97a8da09fe0bfe42d648f/src/devices/Mcp23xxx/Mcp23xxx.cs#L57 https://github.com/dotnet/wpf/blob/2cbb1ad9759c32dc575c7537057a29ee7da2e1b2/src/Microsoft.DotNet.Wpf/src/System.Xaml/System/Xaml/Schema/Reflector.cs#L334 https://github.com/dotnet/coreclr/blob/030a3ea9b8dbeae89c90d34441d4d9a1cf4a7de6/tests/src/JIT/Performance/CodeQuality/V8/Crypto/Crypto.cs#L1294

ComplementBit https://github.com/dotnet/coreclr/blob/030a3ea9b8dbeae89c90d34441d4d9a1cf4a7de6/tests/src/JIT/Performance/CodeQuality/V8/Crypto/Crypto.cs#L1317 https://github.com/dotnet/iot/blob/93f2bd3f2a4d64528ca97a8da09fe0bfe42d648f/src/devices/Mcp23xxx/Mcp23xxx.cs#L196

grant-d commented 5 years ago

they could find the existing usages of proposed APIs

@tannergooding I have tried to find several examples of each, searching across all dotnet C# repos, eg: popcount language:C# user:dotnet.

Any other examples you're aware of would be welcome.

and move them to the currently internal BitOps class...

Did you mean I should create a PR (specifically keeping the class internal)? Add the BitOps method, then update the callsite(s)

tannergooding commented 5 years ago

Did you mean I should create a PR (specifically keeping the class internal)? Add the BitOps method, then update the callsite(s)

Yes, I think that would be reasonable. It should not only help centralize the code but should give us a clearer picture of who is using it and how.

grant-d commented 5 years ago

Yes, I think that would be reasonable. It should not only help centralize the code but should give us a clearer picture of who is using it and how.

PR created: https://github.com/dotnet/coreclr/pull/22225

The first commit shows the unedited code copied verbatim from the known call-sites. It has not been changed, optimized or consolidated since it is meant to demonstrate duplication.

Subsequent commits clean up and consolidate the code. Units already written and will be merged later.

grant-d commented 5 years ago

Trimmed down and cleaned-up the spec based on the above analysis. Still some questions to answer (such as signed overloads or not) but it's now a much tighter surface area.

tannergooding commented 5 years ago

@grant-d, the original post gas gotten a bit large. Could you also provide a summary of the changes you made here (removals, additions, etc)

grant-d commented 5 years ago

Per the changelog:

The last item is related to the new // comments through the spec regarding overloads for byte and ushort. For example, see RotateLeft in the spec where I emphasize the need for such overloads, but I removed such overloads from (eg) PopCount. Also added the two new related sections in the spec; Operand Size & Sign.

grant-d commented 5 years ago

Per discussion in https://github.com/dotnet/coreclr/pull/22333, changed return type of count-oriented methods (such as PopCount) to int instead of uint. Also updated the POC code

grant-d commented 5 years ago

@tannergooding I completed a set of coreclr PRs to:

I would like to update downstream callsites in corefx and corert as well as externals such as roslyn. And of course expose them for public consumption.

But I think I need a reference assembly for all those cases, right? Please advise me of the next steps?

grant-d commented 5 years ago

Updated spec to match findings of code analysis discussed above. There were several existing callsites that used byte overloads.

jkotas commented 5 years ago

@grant-d I would scope this issue down to just what we have in CoreLib and that is clearly useful. It will make the API review on this easy and it will allow us to make the most useful APIs public in .NET Core 3.0.

I would split the bit manipulation APIs into a separate issue that will be looked at and discussed separately, on its own schedule.

grant-d commented 5 years ago

That's a great idea Jan. I had already split this into 2 asks in the summary, but I will create two separate issues instead

grant-d commented 5 years ago

Done - see https://github.com/dotnet/corefx/issues/35419

jkotas commented 5 years ago

Lets get the subset from dotnet/corefx#35419 through first.

ghost commented 4 years ago

Tagging subscribers to this area: @tannergooding Notify danmosemsft if you want to be subscribed.

tannergooding commented 4 years ago

I'd like to move forward and mark this as ready-for-review. However, at the very least BitOps needs to become BitOperations, as that is the name of the already approved class. CC @grant-d

It's worth noting that of the variants being exposed, byte doesn't have hardware support. The x86 hardware only accelerates 16, 32, and 64-bit variants. The proposal, as given, isn't exposing short, ushort, long, or ulong (nor sbyte), which I think is based on @jkotas' request to just trim this down to what S.P.Corelib is using, correct?

The native intrinsics return unsigned char (effectively bool) and take the first parameter by pointer, so I think taking by ref as a variant makes sense and fits in with how these are expected to be used (which is likely in a if (InsertBit(ref _value, 4)) like scenario). @CarolEidt, you raised some concerns around taking ref above, is that still the case given the non mutating variants?

tannergooding commented 4 years ago

Meant to link to the native intrinsics above: https://docs.microsoft.com/en-us/cpp/intrinsics/bittest-bittest64?view=vs-2019

jkotas commented 4 years ago

which I think is based on @jkotas' request to just trim this down to what S.P.Corelib is using, correct?

The original proposal had methods like PopCount that were already approved and that we had no intrinsic patterns for.

I do not see value in having method for bit tests, etc. I believe that the existing pattern for bit tests like x & (1 << 5) are very concise and easy to read for anybody deals with bits.

The fact that MSVC C++ introduced an intrinsic for bit tests sounds like over-engineering to me. They could have picked that up as pattern match just fine.

tannergooding commented 4 years ago

They could have picked that up as pattern match just fine.

And I believe they do, but its the same fundamental problem as with RotateLeft and RotateRight, which is that there are a plethora of possible patterns and those may not be obvious, especially to novice/beginner programmers.

On the other hand, a method provides an explicit way to opt into the functionality, know that it will work (there probably isn't some edge case they are forgetting around bit manipulation), and that it will most likely be efficient for the given platform. I believe this is why we have 13 functions across dotnet (which grant-d found, at least) covering these "concise and easy to read" patterns: https://github.com/dotnet/runtime/issues/27382#issuecomment-457718507. That is, they exist because the API is that much more concise, easy to read, and allows logic updates without finding everywhere you wrote the pattern.

It also solves the problem of matching a more complex pattern where you want to simultaneously branch based off the existing value of a bit while simultaneously setting it to a new value.

jkotas commented 4 years ago

RotateLeft and RotateRight are fine with me. The pattern to write that using language operators is not concise, and there are multiple ways to write that.

I believe this is why we have 13 functions across dotnet (which grant-d found, at least) covering these "concise and easy to read" patterns

The most common ones like RotateLeft / RotateRight, PopCount, etc. are addressed already.

The remaining ones are the bit tests: InsertBit, ClearBit, etc. They show a lot less occurences. I have done a few searches for patterns like & (1 <<) or | (1 <<. Based on these searches, I believe that it is about 100x more common for people to use the these direct patterns than to wrap it in the helper method.

grant-d commented 4 years ago

I'd like to move forward and mark this as ready-for-review. However, at the very least BitOps needs to become BitOperations, as that is the name of the already approved class. CC @grant-d

Thanks for the ping Tanner, updated the spec's name to BitOperations

grant-d commented 4 years ago

FYI, here's my previous research on duplicate functions for each operation. A year+ old so may be more (or less) now: https://github.com/dotnet/corefx/issues/32269#issuecomment-457689128

eg, in ImmutableHashMap:

            [Pure]
            private static uint InsertBit(int position, uint bits)
            {
                Debug.Assert(0 == (bits & (1u << position)));
                return bits | (1u << position);
            }
jkotas commented 4 years ago

This is a good example: It asserts and sets the bit. We would not be able to replace it with the proposed method, without losing the debug validation.