Proposal: Expose Bit Manipulation functions

grant-d commented 6 years ago

Bit manipulation routines are common enough that we should expose a subset as platform primitives. While some of them may be simple to write, it's much harder to achieve the requisite performance desired, especially since they tend to be used within tight loops.

The aim of this proposal is to scope & design a minimal set of functions, to be implemented with a bias towards performance. Per @tannergooding

The point of these APIs is to provide a general-purpose API that works on all platforms (which means providing a software fallback) and is generally-usable. Hardware Intrinsics are for performance oriented scenarios where you require hardware acceleration and need more direct control of the code that is emitted.

Note that even though some of the formula may be simple, relevant callsites are more self-documenting when using the intrinsics (should the dev choose to use them).

Scope

[x] Consolidate existing callsites in CoreCLR (https://github.com/dotnet/coreclr/pull/22584)
[x] Units pass in CoreFX(https://github.com/dotnet/corefx/pull/35193)
[x] Expose System.Numerics.BitOperations as a public ref assembly in CoreFX (https://github.com/dotnet/corefx/issues/35419)
[ ] Consolidate existing callsites in CoreFX (https://github.com/dotnet/corefx/pull/34917)
[ ] Implement proposed methods (this issue)

Rationale and Usage

The proposed functions are already implemented throughout the stack, often with different algorithms, performance characteristics and test coverage. Existing callsites below: https://github.com/dotnet/corefx/issues/32269#issuecomment-457689128 (There is likely to be more; the initial search was timeboxed to ~1 hour)

Some of the implementation have suboptimal performance or bugs. Something like ExtractBit is trivial to implement, but PopCount is more complex and thus prone to logic and performance issues. Hiding these complex formulae behind friendly signatures makes using them more approachable.

Here's an example of a function (BTC) whose signature is simple but the algebra is easy to get wrong.

[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static bool ComplementBit(ref uint value, int bitOffset)
{
    uint mask = 1u << bitOffset;
    bool btc = (value & mask) != 0;

    value = ~(~mask ^ value);

    return btc;
}

However making a call to it meets our goal of abstraction and performance:

uint value = 123;
bool previouslyTrue = BitOperations.ComplementBit(ref value, 6);

Proposed API

The proposed API is purposefully kept lean. We can add more methods in later design iterations. We should view this as an opportunity to get simple, base functionality out the door and not stray into the dangerous territory of adding every bit twiddling hack that exists.

Assume all methods are decorated with [MethodImpl(MethodImplOptions.AggressiveInlining)]

public static class BitOperations
{
    // BT
    bool ExtractBit(byte value, int bitOffset); // Could name this BitTest or TestBit
    bool ExtractBit(uint value, int bitOffset);
    bool ExtractBit(int value, int bitOffset);

    // BTS (scalar)
    byte InsertBit(byte value, int bitOffset); // BitSet or SetBit
    uint InsertBit(uint value, int bitOffset);
    int InsertBit(int value, int bitOffset);

    // True BTS (returns original value)
    bool InsertBit(ref byte value, int bitOffset);
    bool InsertBit(ref uint value, int bitOffset);
    bool InsertBit(ref int value, int bitOffset);

    // BTR
    byte ClearBit(byte value, int bitOffset); // BitReset or ResetBit
    uint ClearBit(uint value, int bitOffset);
    int ClearBit(int value, int bitOffset);

    bool ClearBit(ref byte value, int bitOffset);
    bool ClearBit(ref uint value, int bitOffset);
    bool ClearBit(ref int value, int bitOffset);

    // BTC
    byte ComplementBit(byte value, int bitOffset);
    uint ComplementBit(uint value, int bitOffset);
    int ComplementBit(int value, int bitOffset);

    bool ComplementBit(ref byte value, int bitOffset);
    bool ComplementBit(ref uint value, int bitOffset);
    bool ComplementBit(ref int value, int bitOffset);

    // on ? BTS : BTR
    byte WriteBit(byte value, int bitOffset, bool on);
    uint WriteBit(uint value, int bitOffset, bool on);
    int WriteBit(int value, int bitOffset, bool on);

    bool WriteBit(ref byte value, int bitOffset, bool on);
    bool WriteBit(ref uint value, int bitOffset, bool on);
    bool WriteBit(ref int value, int bitOffset, bool on);
}

Details

The focus will be on performance.
Ultimately, most of the code should be branchless and leverage intrinsics where possible.
Somewhat of an overlap in functionality between InsertBit/ClearBit and WriteBit where the latter conditionally executes the equivalent of either former (using twiddling to do so without branching). But there's enough twiddling in Write to avoid any branching that it's maybe worth keeping both variants.
Since these functions are used in performance-sensitive applications, we avoid input checking and exceptions. The current design tries to dispense with the requirement by sharpening input types, contractual assumptions (eg out-of-bounds offset will use mod n in some functions or result in a no-op in others) and specific design choices.

Questions

What additional sizes & signs of integers do we support, and in which methods? It looks like int, uint and byte are commonly used. Anything else?

Decisions

What namespace this should be in. Decision: namespace System.Numerics.
The class name of BitOps is self-documenting, terse (it might be specified frequently in a callsite, if not aliased with using) and both terms are well-known names or abbreviations. Decision: BitOperations
Decision: We favor well-known names over the alternatives. For example, PopCount could be called CountSetBits but that's not the common lingo used by twiddlers. Furthermore, intrinsics already expose the well-known names, eg Popcnt.PopCount.
Likewise, method names should also be as concise as possible, for example, TrailingZeroCount may be more concisely described as TrailingZeros. Decision: TrailingZeroCount already chosen by a previous PR.
Decision: Count-oriented methods such as PopCount return (idiomatic) int, not uint
Should offset or position parameters be int or uint. Latter is preferred since negatives not permitted regardless. Decision: int is an idiomatic input/output type in C#.
Log(0) is mathematically undefined. Should it return 0 or -1? Decision: Returns 0
Do we care about endianness (ie do we need BE and LE variants of relevant methods). The current proposal is only LE, and is shaped in such a way that we are not future-proof wrt supporting BE. Discussion proposes an alternative. For example, LeadingZeros might need a different algorithm on BE/LE platforms. Decision: Endianess only matters if we are reinterpreting integers.

Sample call sites

The following samples are taken from the linked units, from the method BitOps_Samples. The code chooses values that are easy to eyeball for correctness. The real units cover many more boundaries & conditions.

// ExtractBit: Reads whether the specified bit in a mask is set.
Assert.True(BitOps.ExtractBit((byte)0b0001_0000, 4));
Assert.False(BitOps.ExtractBit((byte)0b0001_0000, 7));

// InsertBit: Sets the specified bit in a mask and returns the new value.
byte dest = 0b0000_1001;
Assert.Equal(0b0010_1001, BitOps.InsertBit(dest, 5));

// InsertBit(ref): Sets the specified bit in a mask and returns whether it was originally set.
Assert.False(BitOps.InsertBit(ref dest, 5));
Assert.Equal(0b0010_1001, dest);

// ClearBit: Clears the specified bit in a mask and returns the new value.
dest = 0b0000_1001;
Assert.Equal(0b0000_0001, BitOps.ClearBit(dest, 3));
// ClearBit(ref): Clears the specified bit in a mask and returns whether it was originally set.
Assert.True(BitOps.ClearBit(ref dest, 3)); 
Assert.Equal(0b0000_0001, dest);

// ComplementBit: Complements the specified bit in a mask and returns the new value.
dest = 0b0000_1001;
Assert.Equal(0b0000_0001, BitOps.ComplementBit(dest, 3));
// ComplementBit(ref): Complements the specified bit in a mask and returns whether it was originally set.
Assert.True(BitOps.ComplementBit(ref dest, 3));
Assert.Equal(0b0000_0001, dest);

// WriteBit: Writes the specified bit in a mask and returns the new value. Does not branch.
dest = 0b0000_1001;
Assert.Equal(0b0000_0001, BitOps.WriteBit(dest, 3, on: false));
// WriteBit(ref): Writes the specified bit in a mask and returns whether it was originally set. Does not branch.
Assert.True(BitOps.WriteBit(ref dest, 3, on: false));
Assert.Equal(0b0000_0001, dest);

Updates

Original issue authored by @mburbea here: Proposal: Add a BitManipulation class
Initial proposal submitted
Methods such as InsertBit accepted a bool that determined whether it set or cleared the bit in question. Such methods have now been refactored in two, InsertBit and ClearBit.
Some name changes based on suggestions (eg FlipBit became ComplementBit).
TestAndSet methods have been refactored into simple scalar functions, for performance.
The offset parameter is int instead of byte. All POC units pass.
Added WriteBit(value, offset, bool on) overloads that conditionally set/clear the specified bit.
Added ExtractByte and friends
Added Evaluate(bool) (used internally, so may as well expose it)
Removed all Span<T> overloads per @tannergooding advice. Will maybe submit in separate proposal
Moved TrailingOnes and LeadingOnes into to do later proposal in comments
Added Log2
Removed redundant overloads
Removed doc-comments so spec is easier to read
Added byte overloads to all crud methods based on code analysis of existing callsites. eg eg bool ExtractBit(byte value, int bitOffset)
Added Scope section at top of spec
Separated calls into existing and proposed sections
Added task list, worded spec in a more concise manner
Moved RotateByte, etc into to do later proposal in comments

grant-d commented 6 years ago

cc @mburbea, @tannergooding , @CarolEidt

tannergooding commented 6 years ago

Please drop the in modifier from the parameters. These are all operating on primitives and passing them around by reference is going to be generally less efficient than just passing them by-value (which should happen in register).

tannergooding commented 6 years ago

Additional Comments/Thoughts:

Whether or not these methods take a ref and return the original value of the bit or just return the modified value will likely need more discussion (possibly just in the API design review).
I'm not sure I like the methods taking a bool.
- InsertBit(value, offset, on) might be better as InsertBit(value, offset) and ResetBit(value, offset) (or ClearBit, ZeroBit, etc).
- Likewise, LeadingCount and TrailingCount are likely better split into LeadingZeroCount, TrailingZeroCount, LeadingOneCount, and TrailingOneCount
I think NegateBit may be more "better" than FlipBit (or ComplementBit)
System.BitOps seems appropriate as a namespace (and lets this live next to BitConverter). Especially since several of these may get used in the Runtime/Framework itself
I imagine we will want to just provide overloads for the signed types

grant-d commented 6 years ago

I'm not sure I like the methods taking a bool.

I went back and forth on that one. The bool makes the api tighter. But it means that ultimately the implementation has branching instructions, which means perf suffers. So I am happy to refactor.

Whether or not these methods take a ref and return the original value of the bit or just return the modified value will likely need more discussion (possibly just in the API design review).

I designed this based on a previous suggestion of yours, but I am happy to revert. Perhaps you meant ExchangeBit (BTS/BTR,BTC) to be a separate additional methods? Otherwise we could return a tuple but that has more overhead.

I think NegateBit may be more "better" than FlipBit (or ComplementBit)

I will make that change

omariom commented 6 years ago

@grant-d public static bool InsertBit(ref uint value, byte offset);

Will JIT be able to optimize it properly if the value is an enregistered local? Like in this code:

int M(uint value)
{
    return InsertBit(in value, 3) ? 5 :6; 
}

grant-d commented 6 years ago

It needs to be a ref, ie:

private static int M(uint value) => InsertBit(ref value, 3, true) ? 5 : 6;

I don't have enough knowledge of what the optimizer would do, someone else may be able to advise us.

Clockwork-Muse commented 6 years ago

Um, ExtractBit takes byte and ushort, but neither of those are present for InsertBit/ClearBit (instead there's int and long)?

CarolEidt commented 6 years ago

These need to be passed by value. Passing things around by ref or in will make it more difficult to optimize in the long run. If you need a variant that both modifies the value and returns whether a bit was set, it should be a different variant, and the "normal" form of InsertBit should just return the modified value.

grant-d commented 6 years ago

Sounds good, I have made the change. @tannergooding, let me know if you think the BTS/BTR/BTC variants need to be in the first deliverable.

grant-d commented 6 years ago

Will JIT be able to optimize it properly

It will now - I have changed all ref type methods to be scalar functions; see notes in the spec.

danmoseley commented 6 years ago

API review probably won't like the contraction BitOps. BitOperations is not so bad or possibly Bits even...

grant-d commented 6 years ago

I like Bits, I have added all of these to the naming suggestions.

grant-d commented 6 years ago

If you need a variant that both modifies the value and returns whether a bit was set, it should be a different variant.

Done - both variants exist.

grant-d commented 6 years ago

Note to self:

Optimization of Span overloads: https://github.com/dotnet/corefx/issues/32184

SSE2 Intrinsics (still experimental)

using System.Runtime.Intrinsics;
using System.Runtime.Intrinsics.X86;

grant-d commented 6 years ago

jhudsoncedaron commented 6 years ago

This is pointless unless you also the attribute for AggressiveInlining as the function call overhead will eat your gains otherwise. Therefore, AggressiveInlining should be contractual on these.

grant-d commented 6 years ago

AggressiveInlining should be contractual

The POC already does so. I will add a note to the spec

tannergooding commented 6 years ago

This is pointless unless you also the attribute for AggressiveInlining as the function call overhead will eat your gains otherwise.

Many of these functions will be treated as intrinsic and will be specially handled by the JIT to emit a single instruction, regardless of what the actual method body contains for the software fallback.

The JIT is also very good at inlining small functions, regardless of whether or not the AggressiveInlining attribute exists.

GSPP commented 6 years ago

InsertBit(value, offset, on) is needed because sometimes the on value is dynamic. There should be separate SetBit and ClearBit methods.

grant-d commented 6 years ago

@GSPP can this not be handled from the callsite. if (on) SetBit(i) else ClearBit(i) I am trying to keep the API surface as concise as possible, let me know if you have a compelling reason

jhudsoncedaron commented 6 years ago

@grant-d: InsertBit can be implemented branchlessly on any reasonably CPU.

grant-d commented 6 years ago

InsertBit is already implemented without branching, see POC code.

grant-d commented 6 years ago

The question was whether there could be a mutator that accepts a bool, then conditionally inserts or extracts. My preference is to externalize branching (at the callsite).

jhudsoncedaron commented 6 years ago

And I'm saying the dynamic set-clear version can be implemented w/o branching.

ulong mask = 1 << i;
ulong bit = (ulong)on << i;
return (value & ~mask) | bit;

grant-d commented 6 years ago

Got you. Thanks for the gist. I still am curious whether the 'dynamic' use case is compelling enough to have a somewhat overlapping API?

tannergooding commented 6 years ago

@jhudsoncedaron, just because you can implement that in software, doesn't necessarily mean it is (necessarily) a good API to expose here or that it will be efficient.

Many of these functions will be treated as intrinsic, which means that they will will compile down to a single CPU instruction (rather than 5+ instructions). As an example, we can use the BSF (Bit Scan Forward), BSR (Bit Scan Reverse), BT (Bit Test), BTC (Bit Test and Complement), BTR (Bit Test and Reset), BTS (Bit Test and Set), LZCNT (Count the Number of Leading Zero Bits), and POPCNT (Return the Count of Number of Bits Set to 1) instructions on x86/x64 to make these super efficient.

Also, I don't think you can convert a bool to ulong. You will get error CS0030: Cannot convert type 'bool' to 'ulong'

jhudsoncedaron commented 6 years ago

I only ever use the dynamic version of InsertBit. That might not be valid C# but the algorithm is correct anyway. bools are stored as either a 1 or a 0 so it's just a matter of convincing the compiler to emit the right code. The harder this is to do, the stronger the argument for putting it in the library.

CarolEidt commented 6 years ago

I tend to agree with @jhudsoncedaron that the dynamic version is useful. For this particular case if we really wanted to minimize API surface area I might vote for just the dynamic case, since I would expect the value to be either clearly a constant or not. And it would be easier for the jit to make that optimization than to recognize a sequence with branches.

grant-d commented 6 years ago

[Edited]

I assumed the gist was pseudocode; in C# we can do one of the following. Micro benchmarking shows better perf (0.6x) than the idiomatic expression.

return condition ? 37 : 0; // 1.0x (idiomatic)

return (*(byte*)&condition) * 37; // 0.95x (cannot be inlined)
return Unsafe.As<bool, byte>(condition) * 37; // 0.60x
return new UnionStruct { Bool = condition }.Value * 37; // 0.76x (ExplicitLayout, etc)

This relies on internal knowledge of bool but I guess that's no different to the rest of twiddling where we rely on internal knowledge of bit layouts? I recall (perhaps wrongly) something about VB (maybe v6, maybe .Net) using -1 for false. But since this is ultimately CLI, I guess that's not a concern.

That being said. I started the spec out with the dynamic signature, and followed a suggestion to move to explicit. I don't mind one or the other (or both) but we need a decision here. Maybe I should just add both and we can see which one gets downvoted?

jhudsoncedaron commented 6 years ago

VB .NET did in fact use -1 for booleans; however this technique would reveal 1 anyway. It really wrote 1 to the underlying type to avoid driving the framework code bonkers. (Go ahead, use this unsafe to make a bool with a value of 3 and watch things fail in amusing ways.)

grant-d commented 6 years ago

[Edited] OK, added conditional overrides to the spec. Here's the POC implementation.

Let's wait for review & voting to see what we keep. There's enough twiddling to avoid branching that I would tend to agree it's a useful overload.

        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        public static byte WriteBit(byte value, int offset, bool on)
        {
            int shft = offset & 7;
            uint mask = 1U << shft;

            uint onn = Unsafe.As<bool, uint>(on); // true ? 1 : 0
            onn <<= shft;

            return (byte)((value & ~mask) | onn);
        }

grant-d commented 5 years ago

Added ExtractByte and friends

redknightlois commented 5 years ago

@tannergooding

Many of these functions will be treated as intrinsic and will be specially handled by the JIT to emit a single instruction, regardless of what the actual method body contains for the software fallback.

True, but you are assuming a single JIT (Microsoft) and also that the caller function didn't trip on a particular (warranted or not) limit to inlining. There is actually no shortcoming for contractually ensure Bits methods must be inlined; as implementing 90%+ of those methods on a loop without inlining would kill you performance wise.

grant-d commented 5 years ago

There's an ambiguity in some of these signatures that I am trying to get my head around. It results in corrupt data.

TLDR Some methods are not safe under implicit upcast operations. This can lead to ambiguity and corruption. For example foo(123u) != foo((byte)123). Suggestion is to name all such methods explicitly to avoid said issue: foo8(), foo16(), foo32(), foo64()

Detail For example, I want to rotate a byte value left. I have a byte in hand, and pass it to the function:

byte foo = 0b_1000_0001;
var result = RotateLeft(foo, 2); // = 0000_0110

Yay, that worked. A few milestones later, but this time I still have a byte value but I happen to have it stored in a uint. Maybe it was a parameter originally declared as byte and someone else changed it. Or maybe that's how it came off the wire or protobuf. Or maybe I am being careful about alignment. Either way I know it's a byte and thus expect byte semantics.

uint foo = 0b_1000_0001; // Same byte value, upcast to uint
// Implicit upcast - my callsite did not change:
var result = RotateLeft(foo, 2); // = 0000_0010_0000_0100

Different answer. Ultimately silent corruption.

Of course the easy workaround is:

uint foo = 0b_1000_0001;
var result = RotateLeft((byte)foo, 2); // = 0000_0110

But that means I will need to use this pattern at all my callsites. That is untenable.

It seems the shape of this kind of method makes it easy to call the wrong overload, despite the niceties of having a method group. (Note that all 4 overloads are necessary because they behave differently, and furthermore we are trying to avoid throwing exceptions in guard clauses.)

Suggestion: Maybe we should distinguish such methods explicitly, something like:

uint foo = 0b_1000_0001; // Unsigned integer of some bitsize
var result1 = RotateLeft8(foo, 2); // May support signed overloads too, so suffix is `8` not `UInt8`
var result2 = RotateLeft16(foo, 2); // A nice side effect is terser names
var result3 = RotateLeft32(foo, 2); // See next code block
var result4 = RotateLeft64(foo, 2);

This actually woks nicely should we decide to support signed integers:

int foo = 0b_1000_0001; // Signed integer of some bitsize
var result1 = RotateLeft8(foo, 2); // Suffix of '8' works nicely for byte|sbyte
var result2 = RotateLeft16(foo, 2);
var result3 = RotateLeft32(foo, 2);
var result4 = RotateLeft64(foo, 2);

Note that this does not apply to all method groups. For example:

byte foo = 0b_1000_0001;
var result1 = PopCount(foo); // = 2
var result2 = PopCount((ulong)foo); // = 2

Implicit upcast is safe here; we get the same answer both ways.

Here's a different example. Once again I have a byte in hand:

byte foo = 0b_1000_0000;
// By convention, shift operators are mod-bitsize, so 15 is exactly equivalent to 15 % 8 == 7
var result = ExtractBit(foo, 15); // == 1
// In other words this is exactly equivalent to calling:
result = ExtractBit(foo, 7); // == 1

Now the parameter changes or whatever, and I now have a byte by happenstance stored in a uint:

uint foo = 0b_1000_0000;
// Here, 15 is exactly equivalent to 15 % 32 == 15
var result = ExtractBit(foo, 15); // == 0

We get a different answer. Per previous suggestion, we'd need to distinguish these methods too:

uint foo = 0b_1000_0000;
var result1 = ExtractBit8(foo, 15); // May support signed overloads too, so suffix is 8 not UInt8
var result2 = ExtractBit16(foo, 15);
var result3 = ExtractBit32(foo, 15);
var result4 = ExtractBit64(foo, 15);

grant-d commented 5 years ago

Considering the question on Big/Little Endian. If we ship this api shape as-is (which is only LE), we are not future proof in consideration of one day deciding to support BE. An alternative to the proposed design is to follow the pattern employed in classes like Encoding where we have an abstract base class that contains (in this case) two singletons, ostensibly named LE and BE:

public abstract class BitOps
{
    public static BitOps BE { get; }
    public static BitOps LE { get; }

    protected BitOps()
    { }
}

Since the number of overloads is so high (explosion due to byte, ushort, uint, ulong as well as Span<T> combinatorials), perhaps we can extend this idea even further:

public abstract class BitOps<T> where T : unmanaged
{
    public static BitOps<byte> ByteBE { get; }
    public static BitOps<ushort> UInt16BE { get; }
    public static BitOps<uint> UInt32BE { get; }
    public static BitOps<ulong> UInt64BE { get; }

    public static BitOps<sbyte> SByteBE { get; }
    public static BitOps<short> Int16BE { get; }
    public static BitOps<int> Int32BE { get; }
    public static BitOps<long> Int64BE { get; }
}

jhudsoncedaron commented 5 years ago

@grant-d : I don't get it. Aren't all of these operating on integers of the appropriate size? Shouldn't those already be in the correct endian?

grant-d commented 5 years ago

[Edited] My concern (perhaps unwarranted, please correct me) is that when inserting an int into, say, a Span<byte> then LE/BE may be an issue? Also, for functions like LeadingZeros, does BE/LE not affect the algorithm since assumptions about the bit layout are perhaps invalid.

jhudsoncedaron commented 5 years ago

I see it now. Because the JIT can't inline function calls it can't statically resolve, I think we'd be much better off with

public static BitOps<byte> ByteBE { get; }
public static BitOps<ushort> UInt16BE { get; }
public static BitOps<uint> UInt32BE { get; }
public static BitOps<ulong> UInt64BE { get; }

public static BitOps<sbyte> SByteBE { get; }
public static BitOps<short> Int16BE { get; }
public static BitOps<int> Int32BE { get; }
public static BitOps<long> Int64BE { get; }

and use #if at compile time of .NET Core to pick the correct implementation.

(sorry; I copied the wrong code block; this was intended to be the ExtractInt16, InsertInt16, etc. functions)

grant-d commented 5 years ago

I am leaning that way too. Though I am not sure about the #if. Is there a use-case for being on a LE platform but wanting to do BE operations. Perhaps because it's network format, or transpilation, or ...

jhudsoncedaron commented 5 years ago

Correct; when messing with file formats we want the file format's bit order. Lots of file formats are in BE order now because classically we had htons/l/q to ensure BE order but no functions to ensure LE order.

redknightlois commented 5 years ago

Almost no operation out there in the framework is Endian aware AFAIK with the exception of a few HostToNetwork usually in networking classes. Encoding the endianness into the method is unseen on the framework as far as I can remember. (That being said, I wouldn't start now if it would be my decision). IMHO it is the 'implementors' burden to deal with differences in endianess... because as you dont touch the disk or/the network there is really no difference for you. So it is an interoperation thing only.

jhudsoncedaron commented 5 years ago

@redknightlois : I guess you don't remember the BitConverter madness when mono was ported to Alpha. It turned out almost all the calls to BitConverter in application code wanted a specific endian rather than machine endian.

Clockwork-Muse commented 5 years ago

Yes, endianess would be a concern. However, only the Span<T> operations really have to worry about it, which probably means it should be a parameter?

On that note:

Suggestion: Maybe we should distinguish such methods explicitly, something like:
uint foo = 0b_1000_0001;
var result1 = RotateLeft8(foo, 2); // May support signed overloads too, so suffix is 8 not UInt8
var result2 = RotateLeft16(foo, 2); // A nice side effect is terser names
var result3 = RotateLeft32(foo, 2);
var result4 = RotateLeft64(foo, 2);
The problem I have for this is that it doesn't completely solve your problem, it just spins a new one: what happens if there are bits set in the upper registers? There's a discoverability issue - the method may sound like it rotates bits by 8, with an offset. Also, my immediate reaction is, "why is it only ever the lowest n?" - I think I want something like:
uint RotateWindowLeft(uint value, int bits, int start_index = 0, int window_length = 32);
Some of this is a reaction to:

Yay, that worked. A few milestones later, but this time I still have a byte value but I happen to have it stored in a uint. Maybe it was a parameter originally declared as byte and someone else changed it. Or maybe that's how it came off the wire or protobuf. Or maybe I am being careful about alignment. Either way I know it's a byte and thus expect byte semantics.
uint foo = 0b_1000_0001; // Same byte value, upcast to uint
var result = RotateLeft(foo, 2); // = 0000_0010_0000_0100
Different answer. Ultimately silent corruption. (Emphasis mine) ...uh, shouldn't your unit tests catch that kind of thing?

Of your mentioned reasons for the change, only stored alignment strikes me as particularly relevant - and at that, that you're overriding/ignoring the native alignment behavior of the framework. At which point, I'm more likely to want to be casting on each operation anyways, since any values outside the 'real' size should be invalid, and should be truncated.

tannergooding commented 5 years ago

If this is going to go to API review, it needs to stop being modified. The larger the proposal is, the less likely it will get reviewed in normal triage and the more likely it will get pushed back to a special triage session (which can take much longer to schedule/review/approve/etc).

I would say that, the following should be considered out of scope and should be handled in a new/separate proposal:

Evaluate
IsPowerOf2
Parity
Anything that takes a Span<T>, rather than a primitive

This leaves:

ExtractBit
InsertBit
ClearBit
ResetBit
ComplementBit
ReadBit
WriteBit
RotateLeft
RotateRight
PopCount
LeadingZeros
TrailingZeros
LeadingOnes
TrailingOnes

This is still a fairly large proposal and covers a number of items that we currently know have use-cases both in and outside the framework. These, for the most part, are "intrinsic" operations that most (modern) platforms support as a single operation or can be sufficiently accelerated. These do not have to worry about any special semantics, such as whether it is big/little endian or how it operates on larger data inputs

grant-d commented 5 years ago

Thanks @tannergooding , that gives me relief. The surface has been growing and I was not sure how to scope it back down. I will trim the spec. [Edit] Done

grant-d commented 5 years ago

For posterity, here's the trimmed-out spec that we can maybe submit it in a later proposal. Note that Evaluate (Edit: now called If) is used internally, so it's useful to expose for external use too.

public static partial class BitOps // .Span and sundry
{
    /// <summary>
    /// Count the number of leading one bits in a mask.
    /// </summary>
    /// <param name="value">The mask.</param>
    byte LeadingOnes(byte value);
        // For brevity, ushort|uint|ulong overloads not shown

    /// <summary>
    /// Count the number of trailing one bits in a mask.
    /// </summary>
    /// <param name="value">The mask.</param>
    byte TrailingOnes(byte value);
        // For brevity, ushort|uint|ulong overloads not shown

    /// <summary>
    /// Converts a bool to an integer value, without branching.
    /// Returns <paramref name="trueValue"/> if True, else returns <paramref name="falseValue"/>.
    /// </summary>
    /// <param name="condition">The value to convert.</param>
    /// <param name="trueValue">The value to return if True.</param>
    /// <param name="falseValue">The value to return if False.</param>
    uint Iff(bool condition, uint trueValue, uint falseValue);
    uint Iff(bool condition, uint trueValue);
        // For brevity, int|long|ulong overloads not shown
        // byte and ushort overloads are redundant
        // Benchmark: 0.64 cost of idiomatic bool expression (condition ? 1 : 0)
    byte AsByte(bool condition); // Normalized (see notes). Naming TBD
    byte AsByte(ref bool condition);

    /// <summary>
    /// Returns 1 if <paramref name="value"/> is non-zero, else returns 0.
    /// Does not incur branching.
    /// Logically equivalent to CMOVNZ.
    /// </summary>
    /// <param name="value">The value to inspect.</param>
    public static byte Any(ushort value);
        // For brevity, short|int|uint|long|ulong overloads not shown

    /// <summary>
    /// Returns 1 if <paramref name="value"/> is zero, else returns 0.
    /// Does not incur branching.
    /// Logically equivalent to CMOVZ.
    /// </summary>
    /// <param name="value">The value to inspect.</param>
    public static byte NotAny(ushort value);
        // For brevity, short|int|uint|long|ulong overloads not shown

    /// <summary>
    /// Returns 1 if the bit count is odd, else 0.
    /// Logically equivalent to PopCount mod 2.
    /// </summary>
    /// <param name="value">The value.</param>
    int Parity(uint value);
    int Parity(ulong value);
        // byte and ushort overloads are redundant

    /// <summary>
    /// Returns True if the value is a power of 2, else False.
    /// </summary>
    /// <param name="value">The value.</param>
    bool IsPowerOf2(uint value);
    bool IsPowerOf2(ulong value);
        // byte and ushort overloads are redundant

    /// <summary>
    /// Extracts a byte value from a longer integer.
    /// </summary>
    /// <param name="value">The bit mask.</param>
    /// <param name="bitOffset">The ordinal position to read.</param>
    byte ExtractByte(ushort value, int bitOffset);
    byte ExtractByte(uint value, int bitOffset);
    byte ExtractByte(ulong value, int bitOffset);

    /// <summary>
    /// Inserts a byte value into a longer integer.
    /// </summary>
    /// <param name="value">The bit mask.</param>
    /// <param name="bitOffset">The ordinal position to write.</param>
    /// <param name="insert">The value to insert.</param>
    ushort InsertByte(ushort value, int bitOffset, byte insert);
    uint InsertByte(uint value, int bitOffset, byte insert);
    ulong InsertByte(ulong value, int bitOffset, byte insert);

    ushort ExtractUInt16(uint value, int bitOffset);
    ushort ExtractUInt16(ulong value, int bitOffset);

    uint InsertUInt16(uint value, int bitOffset, ushort insert);
    ulong InsertUInt16(ulong value, int bitOffset, ushort insert);

    uint ExtractUInt32(ulong value, int bitOffset);
    ulong InsertUInt32(ulong value, int bitOffset, uint insert);

    /// <summary>
    /// Reads whether the specified bit in a mask is set.
    /// </summary>
    /// <param name="value">The mask.</param>
    /// <param name="offset">The ordinal position of the bit to read.</param>
    bool ExtractBit(ReadOnlySpan<byte> value, uint offset);
        // For brevity, ushort|uint|ulong overloads not shown

    /// <summary>
    /// Sets the specified bit in a mask and returns whether the bit was originally set.
    /// </summary>
    /// <param name="value">The mask.</param>
    /// <param name="bitOffset">The ordinal position of the bit to set.</param>
    bool InsertBit(Span<byte> value, uint bitOffset);
        // For brevity, ushort|uint|ulong overloads not shown

    /// <summary>
    /// Clears the specified bit in a mask and returns whether the bit was originally set.
    /// </summary>
    /// <param name="value">The mask.</param>
    /// <param name="bitOffset">The ordinal position of the bit to clear.</param>
    bool ClearBit(Span<byte> value, uint bitOffset);
        // For brevity, ushort|uint|ulong overloads not shown

    /// <summary>
    /// Complements the specified bit in a mask and returns whether the bit was originally set.
    /// </summary>
    /// <param name="value">The mask.</param>
    /// <param name="bitOffset">The ordinal position of the bit to complement.</param>
    bool ComplementBit(Span<byte> value, uint bitOffset);
        // For brevity, ushort|uint|ulong overloads not shown

    /// <summary>
    /// Reads the specified byte from a span, given the bit offset.
    /// </summary>
    /// <param name="span">The span.</param>
    /// <param name="bitOffset">The ordinal position to read.</param>
    byte ExtractByte(ReadOnlySpan<byte> span, int bitOffset);
    byte ExtractByte(ReadOnlySpan<ushort> span, int bitOffset);
    byte ExtractByte(ReadOnlySpan<uint> span, int bitOffset);
    byte ExtractByte(ReadOnlySpan<ulong> span, int bitOffset);

    ushort ExtractUInt16(ReadOnlySpan<byte> span, int bitOffset);
        // For brevity, ushort|uint|ulong overloads not shown

    uint ExtractUInt32(ReadOnlySpan<byte> span, int bitOffset);
        // For brevity, ushort|uint|ulong overloads not shown

    ulong ExtractUInt64(ReadOnlySpan<byte> span, int bitOffset);
        // For brevity, ushort|uint|ulong overloads not shown

    /// <summary>
    /// Writes the specified value to a span, given the bit offset, and returns the original value.
    /// </summary>
    /// <param name="span">The span.</param>
    /// <param name="bitOffset">The ordinal position to write.</param>
    /// <param name="value">The value to write.</param>
    byte InsertByte(Span<byte> span, int bitOffset, byte value);
    byte InsertByte(Span<ushort> span, int bitOffset, byte value);
    byte InsertByte(Span<uint> span, int bitOffset, byte value);
    byte InsertByte(Span<ulong> span, int bitOffset, byte value);

    byte InsertUInt16(Span<byte> span, int bitOffset, ushort value);
        // For brevity, ushort|uint|ulong overloads not shown

    byte InsertUInt32(Span<byte> span, int bitOffset, uint value);
        // For brevity, ushort|uint|ulong overloads not shown

    byte InsertUInt64(Span<byte> span, int bitOffset, ulong value);
        // For brevity, ushort|uint|ulong overloads not shown

    // Alternative pattern that mitigates throwing OutOfRange exceptions
    bool TryInsertByte(Span<byte> span, int bitOffset, byte value);
    bool TryExtractByte(ReadOnlySpan<byte> span, int bitOffset, out byte value);

    // Since operand size DOES affect result, it would be a breaking change to add these later.
    // eg ROTL32(123u) != ROTL8((byte)123)
    // Unless we (later) add them with a distinguished name, eg RotateLeft8, RotateLeft16

    byte RotateLeft(byte value, int bitOffset);
    ushort RotateLeft(ushort value, int bitOffset);

    byte RotateRight(byte value, int bitOffset);
    ushort RotateRight(ushort value, int bitOffset);

    int LeadingZeroCount(byte value);
    int LeadingZeroCount(ushort value);
}

Questions

Support ReadOnlySpan for getters, and Span for mutators. Not sure if we need to explicitly support Span for getters too; I would hope not.
Should we support the TryRead/Write pattern. Versus the current shape which is Read/Write but then has to throw for OutOfRange conditions. In other words, since this is a perf-sensitive feature, using Try helps avoid exceptions.
For Span overloads, is there a need for overloads that work atomically, eg LeadingZeros(new byte[50]{ }) == 400. Versus overloads that index into a specific byte. Suggestion: No.
See conversation item regarding methods like RotateLeft being ambiguous.

The following style of method group may have redundancy, since the caller can convert the latter 3 to Span<byte> first?

byte ExtractByte(ReadOnlySpan<byte> span, int bitOffset);
// Redundant?
// Caller can use MemoryMarshal.AsBytes(span) first, though they then have
// to take care of the offset math themselves. Unless we provide an offset helper.
byte ExtractByte(ReadOnlySpan<ushort> span, int bitOffset);
byte ExtractByte(ReadOnlySpan<uint> span, int bitOffset);
byte ExtractByte(ReadOnlySpan<ulong> span, int bitOffset);

BitConverter has already been span-ified, and as such has an overlapping api surface. For example, bool TryWriteBytes(Span<byte> destination, long value) is similar to signatures here.

grant-d commented 5 years ago

...shouldn't your unit tests catch that kind of thing?

I agree with the some of your feedback, but the above is a bad developer experience. If I call Math.Max and incur an unexpected implicit upcast, the result typically doesn't change. But with this specific set of functions, it does. Unfortunately, not all consumers of .Net write units. That design is not the pit of success.

Clockwork-Muse commented 5 years ago

...what implicit upcast? Each of the primitive types has its own copy of RotateLeft, so it would return the same type as was passed in. If you're worried about the cast when assigning back to the same variable, Math.Max would run into the same problem.

grant-d commented 5 years ago

No I am not worried about the re-assignment. I am worried that in terms of overload resolution, the result feels a little non-deterministic. I will try another example:

void Foo(byte n) // <-- Note byte param
{
   // a mile of code
   var x = RotateLeft(n, 17); // Produces 123
   var y = Math.Min(n, 17); // Produces 17
}

Sometime later, someone else modifies the signature:

void Foo(ulong n) // <-- Note byte became uint or ulong
{
   // a mile of code 
   var x = RotateLeft(n, 17); // Produces 456 <-- Different result
   var y = Math.Min(n, 17); // Produces 17 <-- Same result
}

Calling RotateLeft32(n, 2) would have produced the same correct result deterministically.

redknightlois commented 5 years ago

Encoding the size of the type on the signature in cases where they lead to inconsistent results is not unheard of. IMHO it is a sensible tradeoff.

dotnet / runtime