Open grant-d opened 6 years ago
cc @mburbea, @tannergooding , @CarolEidt
Please drop the in
modifier from the parameters. These are all operating on primitives and passing them around by reference is going to be generally less efficient than just passing them by-value (which should happen in register
).
Additional Comments/Thoughts:
InsertBit(value, offset, on)
might be better as InsertBit(value, offset)
and ResetBit(value, offset)
(or ClearBit
, ZeroBit
, etc).LeadingCount
and TrailingCount
are likely better split into LeadingZeroCount
, TrailingZeroCount
, LeadingOneCount
, and TrailingOneCount
NegateBit
may be more "better" than FlipBit
(or ComplementBit
)System.BitOps
seems appropriate as a namespace (and lets this live next to BitConverter
). Especially since several of these may get used in the Runtime/Framework itselfI'm not sure I like the methods taking a bool.
I went back and forth on that one. The bool makes the api tighter. But it means that ultimately the implementation has branching instructions, which means perf suffers. So I am happy to refactor.
Whether or not these methods take a ref and return the original value of the bit or just return the modified value will likely need more discussion (possibly just in the API design review).
I designed this based on a previous suggestion of yours, but I am happy to revert. Perhaps you meant ExchangeBit
(BTS/BTR,BTC) to be a separate additional methods? Otherwise we could return a tuple but that has more overhead.
I think NegateBit may be more "better" than FlipBit (or ComplementBit)
I will make that change
@grant-d
public static bool InsertBit(ref uint value, byte offset);
Will JIT be able to optimize it properly if the value is an enregistered local? Like in this code:
int M(uint value)
{
return InsertBit(in value, 3) ? 5 :6;
}
It needs to be a ref
, ie:
private static int M(uint value) => InsertBit(ref value, 3, true) ? 5 : 6;
I don't have enough knowledge of what the optimizer would do, someone else may be able to advise us.
Um, ExtractBit
takes byte
and ushort
, but neither of those are present for InsertBit
/ClearBit
(instead there's int
and long
)?
These need to be passed by value. Passing things around by ref
or in
will make it more difficult to optimize in the long run. If you need a variant that both modifies the value and returns whether a bit was set, it should be a different variant, and the "normal" form of InsertBit
should just return the modified value.
Sounds good, I have made the change. @tannergooding, let me know if you think the BTS
/BTR
/BTC
variants need to be in the first deliverable.
Will JIT be able to optimize it properly
It will now - I have changed all ref
type methods to be scalar functions; see notes in the spec.
API review probably won't like the contraction BitOps
. BitOperations
is not so bad or possibly Bits
even...
I like Bits
, I have added all of these to the naming suggestions.
If you need a variant that both modifies the value and returns whether a bit was set, it should be a different variant.
Done - both variants exist.
Note to self:
Span
overloads: https://github.com/dotnet/corefx/issues/32184using System.Runtime.Intrinsics;
using System.Runtime.Intrinsics.X86;
This is pointless unless you also the attribute for AggressiveInlining as the function call overhead will eat your gains otherwise. Therefore, AggressiveInlining should be contractual on these.
AggressiveInlining should be contractual
The POC already does so. I will add a note to the spec
This is pointless unless you also the attribute for AggressiveInlining as the function call overhead will eat your gains otherwise.
Many of these functions will be treated as intrinsic and will be specially handled by the JIT to emit a single instruction, regardless of what the actual method body contains for the software fallback.
The JIT is also very good at inlining small functions, regardless of whether or not the AggressiveInlining
attribute exists.
InsertBit(value, offset, on)
is needed because sometimes the on
value is dynamic. There should be separate SetBit
and ClearBit
methods.
@GSPP can this not be handled from the callsite.
if (on) SetBit(i) else ClearBit(i)
I am trying to keep the API surface as concise as possible, let me know if you have a compelling reason
@grant-d: InsertBit can be implemented branchlessly on any reasonably CPU.
InsertBit is already implemented without branching, see POC code.
The question was whether there could be a mutator that accepts a bool, then conditionally inserts or extracts. My preference is to externalize branching (at the callsite).
And I'm saying the dynamic set-clear version can be implemented w/o branching.
ulong mask = 1 << i;
ulong bit = (ulong)on << i;
return (value & ~mask) | bit;
Got you. Thanks for the gist. I still am curious whether the 'dynamic' use case is compelling enough to have a somewhat overlapping API?
@jhudsoncedaron, just because you can implement that in software, doesn't necessarily mean it is (necessarily) a good API to expose here or that it will be efficient.
Many of these functions will be treated as intrinsic, which means that they will will compile down to a single CPU instruction (rather than 5+ instructions). As an example, we can use the BSF
(Bit Scan Forward), BSR
(Bit Scan Reverse), BT
(Bit Test), BTC
(Bit Test and Complement), BTR
(Bit Test and Reset), BTS
(Bit Test and Set), LZCNT
(Count the Number of Leading Zero Bits), and POPCNT
(Return the Count of Number of Bits Set to 1) instructions on x86/x64 to make these super efficient.
Also, I don't think you can convert a bool
to ulong
. You will get error CS0030: Cannot convert type 'bool' to 'ulong'
I only ever use the dynamic version of InsertBit. That might not be valid C# but the algorithm is correct anyway. bools are stored as either a 1 or a 0 so it's just a matter of convincing the compiler to emit the right code. The harder this is to do, the stronger the argument for putting it in the library.
I tend to agree with @jhudsoncedaron that the dynamic version is useful. For this particular case if we really wanted to minimize API surface area I might vote for just the dynamic case, since I would expect the value to be either clearly a constant or not. And it would be easier for the jit to make that optimization than to recognize a sequence with branches.
[Edited]
I assumed the gist was pseudocode; in C# we can do one of the following. Micro benchmarking shows better perf (0.6x) than the idiomatic expression.
return condition ? 37 : 0; // 1.0x (idiomatic)
return (*(byte*)&condition) * 37; // 0.95x (cannot be inlined)
return Unsafe.As<bool, byte>(condition) * 37; // 0.60x
return new UnionStruct { Bool = condition }.Value * 37; // 0.76x (ExplicitLayout, etc)
This relies on internal knowledge of bool
but I guess that's no different to the rest of twiddling where we rely on internal knowledge of bit layouts? I recall (perhaps wrongly) something about VB (maybe v6, maybe .Net) using -1
for false
. But since this is ultimately CLI, I guess that's not a concern.
That being said. I started the spec out with the dynamic
signature, and followed a suggestion to move to explicit
. I don't mind one or the other (or both) but we need a decision here. Maybe I should just add both and we can see which one gets downvoted?
VB .NET did in fact use -1 for booleans; however this technique would reveal 1 anyway. It really wrote 1 to the underlying type to avoid driving the framework code bonkers. (Go ahead, use this unsafe to make a bool with a value of 3 and watch things fail in amusing ways.)
[Edited] OK, added conditional overrides to the spec. Here's the POC implementation.
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static byte WriteBit(byte value, int offset, bool on)
{
int shft = offset & 7;
uint mask = 1U << shft;
uint onn = Unsafe.As<bool, uint>(on); // true ? 1 : 0
onn <<= shft;
return (byte)((value & ~mask) | onn);
}
Added ExtractByte
and friends
@tannergooding
Many of these functions will be treated as intrinsic and will be specially handled by the JIT to emit a single instruction, regardless of what the actual method body contains for the software fallback.
True, but you are assuming a single JIT (Microsoft) and also that the caller function didn't trip on a particular (warranted or not) limit to inlining. There is actually no shortcoming for contractually ensure Bits
methods must be inlined; as implementing 90%+ of those methods on a loop without inlining would kill you performance wise.
There's an ambiguity in some of these signatures that I am trying to get my head around. It results in corrupt data.
TLDR
Some methods are not safe under implicit upcast operations. This can lead to ambiguity and corruption.
For example foo(123u) != foo((byte)123)
.
Suggestion is to name all such methods explicitly to avoid said issue:
foo8(), foo16(), foo32(), foo64()
Detail
For example, I want to rotate
a byte
value left
. I have a byte
in hand, and pass it to the function:
byte foo = 0b_1000_0001;
var result = RotateLeft(foo, 2); // = 0000_0110
Yay, that worked. A few milestones later, but this time I still have a byte
value but I happen to have it stored in a uint
. Maybe it was a parameter originally declared as byte
and someone else changed it. Or maybe that's how it came off the wire or protobuf. Or maybe I am being careful about alignment. Either way I know it's a byte and thus expect byte semantics.
uint foo = 0b_1000_0001; // Same byte value, upcast to uint
// Implicit upcast - my callsite did not change:
var result = RotateLeft(foo, 2); // = 0000_0010_0000_0100
Different answer. Ultimately silent corruption.
Of course the easy workaround is:
uint foo = 0b_1000_0001;
var result = RotateLeft((byte)foo, 2); // = 0000_0110
But that means I will need to use this pattern at all my callsites. That is untenable.
It seems the shape of this kind of method makes it easy to call the wrong overload, despite the niceties of having a method group. (Note that all 4 overloads are necessary because they behave differently, and furthermore we are trying to avoid throw
ing exceptions in guard clauses.)
Suggestion: Maybe we should distinguish such methods explicitly, something like:
uint foo = 0b_1000_0001; // Unsigned integer of some bitsize
var result1 = RotateLeft8(foo, 2); // May support signed overloads too, so suffix is `8` not `UInt8`
var result2 = RotateLeft16(foo, 2); // A nice side effect is terser names
var result3 = RotateLeft32(foo, 2); // See next code block
var result4 = RotateLeft64(foo, 2);
This actually woks nicely should we decide to support signed integers:
int foo = 0b_1000_0001; // Signed integer of some bitsize
var result1 = RotateLeft8(foo, 2); // Suffix of '8' works nicely for byte|sbyte
var result2 = RotateLeft16(foo, 2);
var result3 = RotateLeft32(foo, 2);
var result4 = RotateLeft64(foo, 2);
Note that this does not apply to all method groups. For example:
byte foo = 0b_1000_0001;
var result1 = PopCount(foo); // = 2
var result2 = PopCount((ulong)foo); // = 2
Implicit upcast is safe here; we get the same answer both ways.
Here's a different example. Once again I have a byte
in hand:
byte foo = 0b_1000_0000;
// By convention, shift operators are mod-bitsize, so 15 is exactly equivalent to 15 % 8 == 7
var result = ExtractBit(foo, 15); // == 1
// In other words this is exactly equivalent to calling:
result = ExtractBit(foo, 7); // == 1
Now the parameter changes or whatever, and I now have a byte
by happenstance stored in a uint
:
uint foo = 0b_1000_0000;
// Here, 15 is exactly equivalent to 15 % 32 == 15
var result = ExtractBit(foo, 15); // == 0
We get a different answer. Per previous suggestion, we'd need to distinguish these methods too:
uint foo = 0b_1000_0000;
var result1 = ExtractBit8(foo, 15); // May support signed overloads too, so suffix is 8 not UInt8
var result2 = ExtractBit16(foo, 15);
var result3 = ExtractBit32(foo, 15);
var result4 = ExtractBit64(foo, 15);
Considering the question on Big/Little Endian. If we ship this api shape as-is (which is only LE), we are not future proof in consideration of one day deciding to support BE. An alternative to the proposed design is to follow the pattern employed in classes like Encoding
where we have an abstract
base class that contains (in this case) two singletons, ostensibly named LE
and BE
:
public abstract class BitOps
{
public static BitOps BE { get; }
public static BitOps LE { get; }
protected BitOps()
{ }
}
Since the number of overloads is so high (explosion due to byte
, ushort
, uint
, ulong
as well as Span<T>
combinatorials), perhaps we can extend this idea even further:
public abstract class BitOps<T> where T : unmanaged
{
public static BitOps<byte> ByteBE { get; }
public static BitOps<ushort> UInt16BE { get; }
public static BitOps<uint> UInt32BE { get; }
public static BitOps<ulong> UInt64BE { get; }
public static BitOps<sbyte> SByteBE { get; }
public static BitOps<short> Int16BE { get; }
public static BitOps<int> Int32BE { get; }
public static BitOps<long> Int64BE { get; }
}
@grant-d : I don't get it. Aren't all of these operating on integers of the appropriate size? Shouldn't those already be in the correct endian?
[Edited]
My concern (perhaps unwarranted, please correct me) is that when inserting an int
into, say, a Span<byte>
then LE/BE may be an issue?
Also, for functions like LeadingZeros
, does BE/LE not affect the algorithm since assumptions about the bit layout are perhaps invalid.
I see it now. Because the JIT can't inline function calls it can't statically resolve, I think we'd be much better off with
public static BitOps<byte> ByteBE { get; }
public static BitOps<ushort> UInt16BE { get; }
public static BitOps<uint> UInt32BE { get; }
public static BitOps<ulong> UInt64BE { get; }
public static BitOps<sbyte> SByteBE { get; }
public static BitOps<short> Int16BE { get; }
public static BitOps<int> Int32BE { get; }
public static BitOps<long> Int64BE { get; }
and use #if
at compile time of .NET Core to pick the correct implementation.
(sorry; I copied the wrong code block; this was intended to be the ExtractInt16, InsertInt16, etc. functions)
I am leaning that way too. Though I am not sure about the #if
.
Is there a use-case for being on a LE platform but wanting to do BE operations. Perhaps because it's network format, or transpilation, or ...
Correct; when messing with file formats we want the file format's bit order. Lots of file formats are in BE order now because classically we had htons/l/q to ensure BE order but no functions to ensure LE order.
Almost no operation out there in the framework is Endian aware AFAIK with the exception of a few HostToNetwork usually in networking classes. Encoding the endianness into the method is unseen on the framework as far as I can remember. (That being said, I wouldn't start now if it would be my decision). IMHO it is the 'implementors' burden to deal with differences in endianess... because as you dont touch the disk or/the network there is really no difference for you. So it is an interoperation thing only.
@redknightlois : I guess you don't remember the BitConverter madness when mono was ported to Alpha. It turned out almost all the calls to BitConverter in application code wanted a specific endian rather than machine endian.
Yes, endianess would be a concern. However, only the Span<T>
operations really have to worry about it, which probably means it should be a parameter?
On that note:
Suggestion: Maybe we should distinguish such methods explicitly, something like:
uint foo = 0b_1000_0001; var result1 = RotateLeft8(foo, 2); // May support signed overloads too, so suffix is 8 not UInt8 var result2 = RotateLeft16(foo, 2); // A nice side effect is terser names var result3 = RotateLeft32(foo, 2); var result4 = RotateLeft64(foo, 2);
The problem I have for this is that it doesn't completely solve your problem, it just spins a new one: what happens if there are bits set in the upper registers? There's a discoverability issue - the method may sound like it rotates bits by 8, with an offset. Also, my immediate reaction is, "why is it only ever the lowest n?" - I think I want something like:
uint RotateWindowLeft(uint value, int bits, int start_index = 0, int window_length = 32);
Some of this is a reaction to:
Yay, that worked. A few milestones later, but this time I still have a byte value but I happen to have it stored in a uint. Maybe it was a parameter originally declared as byte and someone else changed it. Or maybe that's how it came off the wire or protobuf. Or maybe I am being careful about alignment. Either way I know it's a byte and thus expect byte semantics.
uint foo = 0b_1000_0001; // Same byte value, upcast to uint var result = RotateLeft(foo, 2); // = 0000_0010_0000_0100
Different answer. Ultimately silent corruption. (Emphasis mine) ...uh, shouldn't your unit tests catch that kind of thing?
Of your mentioned reasons for the change, only stored alignment strikes me as particularly relevant - and at that, that you're overriding/ignoring the native alignment behavior of the framework. At which point, I'm more likely to want to be casting on each operation anyways, since any values outside the 'real' size should be invalid, and should be truncated.
If this is going to go to API review, it needs to stop being modified. The larger the proposal is, the less likely it will get reviewed in normal triage and the more likely it will get pushed back to a special triage session (which can take much longer to schedule/review/approve/etc).
I would say that, the following should be considered out of scope and should be handled in a new/separate proposal:
Evaluate
IsPowerOf2
Parity
Span<T>
, rather than a primitiveThis leaves:
This is still a fairly large proposal and covers a number of items that we currently know have use-cases both in and outside the framework. These, for the most part, are "intrinsic" operations that most (modern) platforms support as a single operation or can be sufficiently accelerated. These do not have to worry about any special semantics, such as whether it is big/little endian or how it operates on larger data inputs
Thanks @tannergooding , that gives me relief. The surface has been growing and I was not sure how to scope it back down. I will trim the spec. [Edit] Done
For posterity, here's the trimmed-out spec that we can maybe submit it in a later proposal.
Note that Evaluate
(Edit: now called If
) is used internally, so it's useful to expose for external use too.
public static partial class BitOps // .Span and sundry
{
/// <summary>
/// Count the number of leading one bits in a mask.
/// </summary>
/// <param name="value">The mask.</param>
byte LeadingOnes(byte value);
// For brevity, ushort|uint|ulong overloads not shown
/// <summary>
/// Count the number of trailing one bits in a mask.
/// </summary>
/// <param name="value">The mask.</param>
byte TrailingOnes(byte value);
// For brevity, ushort|uint|ulong overloads not shown
/// <summary>
/// Converts a bool to an integer value, without branching.
/// Returns <paramref name="trueValue"/> if True, else returns <paramref name="falseValue"/>.
/// </summary>
/// <param name="condition">The value to convert.</param>
/// <param name="trueValue">The value to return if True.</param>
/// <param name="falseValue">The value to return if False.</param>
uint Iff(bool condition, uint trueValue, uint falseValue);
uint Iff(bool condition, uint trueValue);
// For brevity, int|long|ulong overloads not shown
// byte and ushort overloads are redundant
// Benchmark: 0.64 cost of idiomatic bool expression (condition ? 1 : 0)
byte AsByte(bool condition); // Normalized (see notes). Naming TBD
byte AsByte(ref bool condition);
/// <summary>
/// Returns 1 if <paramref name="value"/> is non-zero, else returns 0.
/// Does not incur branching.
/// Logically equivalent to CMOVNZ.
/// </summary>
/// <param name="value">The value to inspect.</param>
public static byte Any(ushort value);
// For brevity, short|int|uint|long|ulong overloads not shown
/// <summary>
/// Returns 1 if <paramref name="value"/> is zero, else returns 0.
/// Does not incur branching.
/// Logically equivalent to CMOVZ.
/// </summary>
/// <param name="value">The value to inspect.</param>
public static byte NotAny(ushort value);
// For brevity, short|int|uint|long|ulong overloads not shown
/// <summary>
/// Returns 1 if the bit count is odd, else 0.
/// Logically equivalent to PopCount mod 2.
/// </summary>
/// <param name="value">The value.</param>
int Parity(uint value);
int Parity(ulong value);
// byte and ushort overloads are redundant
/// <summary>
/// Returns True if the value is a power of 2, else False.
/// </summary>
/// <param name="value">The value.</param>
bool IsPowerOf2(uint value);
bool IsPowerOf2(ulong value);
// byte and ushort overloads are redundant
/// <summary>
/// Extracts a byte value from a longer integer.
/// </summary>
/// <param name="value">The bit mask.</param>
/// <param name="bitOffset">The ordinal position to read.</param>
byte ExtractByte(ushort value, int bitOffset);
byte ExtractByte(uint value, int bitOffset);
byte ExtractByte(ulong value, int bitOffset);
/// <summary>
/// Inserts a byte value into a longer integer.
/// </summary>
/// <param name="value">The bit mask.</param>
/// <param name="bitOffset">The ordinal position to write.</param>
/// <param name="insert">The value to insert.</param>
ushort InsertByte(ushort value, int bitOffset, byte insert);
uint InsertByte(uint value, int bitOffset, byte insert);
ulong InsertByte(ulong value, int bitOffset, byte insert);
ushort ExtractUInt16(uint value, int bitOffset);
ushort ExtractUInt16(ulong value, int bitOffset);
uint InsertUInt16(uint value, int bitOffset, ushort insert);
ulong InsertUInt16(ulong value, int bitOffset, ushort insert);
uint ExtractUInt32(ulong value, int bitOffset);
ulong InsertUInt32(ulong value, int bitOffset, uint insert);
/// <summary>
/// Reads whether the specified bit in a mask is set.
/// </summary>
/// <param name="value">The mask.</param>
/// <param name="offset">The ordinal position of the bit to read.</param>
bool ExtractBit(ReadOnlySpan<byte> value, uint offset);
// For brevity, ushort|uint|ulong overloads not shown
/// <summary>
/// Sets the specified bit in a mask and returns whether the bit was originally set.
/// </summary>
/// <param name="value">The mask.</param>
/// <param name="bitOffset">The ordinal position of the bit to set.</param>
bool InsertBit(Span<byte> value, uint bitOffset);
// For brevity, ushort|uint|ulong overloads not shown
/// <summary>
/// Clears the specified bit in a mask and returns whether the bit was originally set.
/// </summary>
/// <param name="value">The mask.</param>
/// <param name="bitOffset">The ordinal position of the bit to clear.</param>
bool ClearBit(Span<byte> value, uint bitOffset);
// For brevity, ushort|uint|ulong overloads not shown
/// <summary>
/// Complements the specified bit in a mask and returns whether the bit was originally set.
/// </summary>
/// <param name="value">The mask.</param>
/// <param name="bitOffset">The ordinal position of the bit to complement.</param>
bool ComplementBit(Span<byte> value, uint bitOffset);
// For brevity, ushort|uint|ulong overloads not shown
/// <summary>
/// Reads the specified byte from a span, given the bit offset.
/// </summary>
/// <param name="span">The span.</param>
/// <param name="bitOffset">The ordinal position to read.</param>
byte ExtractByte(ReadOnlySpan<byte> span, int bitOffset);
byte ExtractByte(ReadOnlySpan<ushort> span, int bitOffset);
byte ExtractByte(ReadOnlySpan<uint> span, int bitOffset);
byte ExtractByte(ReadOnlySpan<ulong> span, int bitOffset);
ushort ExtractUInt16(ReadOnlySpan<byte> span, int bitOffset);
// For brevity, ushort|uint|ulong overloads not shown
uint ExtractUInt32(ReadOnlySpan<byte> span, int bitOffset);
// For brevity, ushort|uint|ulong overloads not shown
ulong ExtractUInt64(ReadOnlySpan<byte> span, int bitOffset);
// For brevity, ushort|uint|ulong overloads not shown
/// <summary>
/// Writes the specified value to a span, given the bit offset, and returns the original value.
/// </summary>
/// <param name="span">The span.</param>
/// <param name="bitOffset">The ordinal position to write.</param>
/// <param name="value">The value to write.</param>
byte InsertByte(Span<byte> span, int bitOffset, byte value);
byte InsertByte(Span<ushort> span, int bitOffset, byte value);
byte InsertByte(Span<uint> span, int bitOffset, byte value);
byte InsertByte(Span<ulong> span, int bitOffset, byte value);
byte InsertUInt16(Span<byte> span, int bitOffset, ushort value);
// For brevity, ushort|uint|ulong overloads not shown
byte InsertUInt32(Span<byte> span, int bitOffset, uint value);
// For brevity, ushort|uint|ulong overloads not shown
byte InsertUInt64(Span<byte> span, int bitOffset, ulong value);
// For brevity, ushort|uint|ulong overloads not shown
// Alternative pattern that mitigates throwing OutOfRange exceptions
bool TryInsertByte(Span<byte> span, int bitOffset, byte value);
bool TryExtractByte(ReadOnlySpan<byte> span, int bitOffset, out byte value);
// Since operand size DOES affect result, it would be a breaking change to add these later.
// eg ROTL32(123u) != ROTL8((byte)123)
// Unless we (later) add them with a distinguished name, eg RotateLeft8, RotateLeft16
byte RotateLeft(byte value, int bitOffset);
ushort RotateLeft(ushort value, int bitOffset);
byte RotateRight(byte value, int bitOffset);
ushort RotateRight(ushort value, int bitOffset);
int LeadingZeroCount(byte value);
int LeadingZeroCount(ushort value);
}
ReadOnlySpan
for getters, and Span
for mutators. Not sure if we need to explicitly support Span
for getters too; I would hope not.TryRead/Write
pattern. Versus the current shape which is Read/Write
but then has to throw
for OutOfRange conditions. In other words, since this is a perf-sensitive feature, using Try
helps avoid exception
s.LeadingZeros(new byte[50]{ }) == 400
. Versus overloads that index into a specific byte. Suggestion: No.RotateLeft
being ambiguous.Span<byte>
first?
byte ExtractByte(ReadOnlySpan<byte> span, int bitOffset);
// Redundant?
// Caller can use MemoryMarshal.AsBytes(span) first, though they then have
// to take care of the offset math themselves. Unless we provide an offset helper.
byte ExtractByte(ReadOnlySpan<ushort> span, int bitOffset);
byte ExtractByte(ReadOnlySpan<uint> span, int bitOffset);
byte ExtractByte(ReadOnlySpan<ulong> span, int bitOffset);
BitConverter
has already been span-ified, and as such has an overlapping api surface. For example, bool TryWriteBytes(Span<byte> destination, long value)
is similar to signatures here....shouldn't your unit tests catch that kind of thing?
I agree with the some of your feedback, but the above is a bad developer experience. If I call Math.Max
and incur an unexpected implicit upcast, the result typically doesn't change. But with this specific set of functions, it does. Unfortunately, not all consumers of .Net write units.
That design is not the pit of success.
...what implicit upcast? Each of the primitive types has its own copy of RotateLeft
, so it would return the same type as was passed in. If you're worried about the cast when assigning back to the same variable, Math.Max
would run into the same problem.
No I am not worried about the re-assignment. I am worried that in terms of overload resolution, the result feels a little non-deterministic. I will try another example:
void Foo(byte n) // <-- Note byte param
{
// a mile of code
var x = RotateLeft(n, 17); // Produces 123
var y = Math.Min(n, 17); // Produces 17
}
Sometime later, someone else modifies the signature:
void Foo(ulong n) // <-- Note byte became uint or ulong
{
// a mile of code
var x = RotateLeft(n, 17); // Produces 456 <-- Different result
var y = Math.Min(n, 17); // Produces 17 <-- Same result
}
Calling RotateLeft32(n, 2)
would have produced the same correct result deterministically.
Encoding the size of the type on the signature in cases where they lead to inconsistent results is not unheard of. IMHO it is a sensible tradeoff.
Bit manipulation routines are common enough that we should expose a subset as platform primitives. While some of them may be simple to write, it's much harder to achieve the requisite performance desired, especially since they tend to be used within tight loops.
The aim of this proposal is to scope & design a minimal set of functions, to be implemented with a bias towards performance. Per @tannergooding
Note that even though some of the formula may be simple, relevant callsites are more self-documenting when using the intrinsics (should the dev choose to use them).
Scope
CoreCLR
(https://github.com/dotnet/coreclr/pull/22584)CoreFX
(https://github.com/dotnet/corefx/pull/35193)System.Numerics.BitOperations
as apublic
ref assembly inCoreFX
(https://github.com/dotnet/corefx/issues/35419)CoreFX
(https://github.com/dotnet/corefx/pull/34917)Rationale and Usage
The proposed functions are already implemented throughout the stack, often with different algorithms, performance characteristics and test coverage. Existing callsites below: https://github.com/dotnet/corefx/issues/32269#issuecomment-457689128 (There is likely to be more; the initial search was timeboxed to ~1 hour)
Some of the implementation have suboptimal performance or bugs. Something like
ExtractBit
is trivial to implement, butPopCount
is more complex and thus prone to logic and performance issues. Hiding these complex formulae behind friendly signatures makes using them more approachable.Here's an example of a function (
BTC
) whose signature is simple but the algebra is easy to get wrong.However making a call to it meets our goal of abstraction and performance:
Proposed API
The proposed API is purposefully kept lean. We can add more methods in later design iterations. We should view this as an opportunity to get simple, base functionality out the door and not stray into the dangerous territory of adding every bit twiddling hack that exists.
Assume all methods are decorated with
[MethodImpl(MethodImplOptions.AggressiveInlining)]
Details
InsertBit
/ClearBit
andWriteBit
where the latter conditionally executes the equivalent of either former (using twiddling to do so without branching). But there's enough twiddling inWrite
to avoid any branching that it's maybe worth keeping both variants.exception
s. The current design tries to dispense with the requirement by sharpening input types, contractual assumptions (eg out-of-boundsoffset
will usemod n
in some functions or result in ano-op
in others) and specific design choices.Questions
int
,uint
andbyte
are commonly used. Anything else?Decisions
namespace
this should be in. Decision:namespace System.Numerics
.BitOps
is self-documenting, terse (it might be specified frequently in a callsite, if not aliased withusing
) and both terms are well-known names or abbreviations. Decision:BitOperations
PopCount
could be calledCountSetBits
but that's not the common lingo used by twiddlers. Furthermore, intrinsics already expose the well-known names, egPopcnt.PopCount
.TrailingZeroCount
may be more concisely described asTrailingZeros
. Decision:TrailingZeroCount
already chosen by a previous PR.PopCount
return (idiomatic)int
, notuint
offset
orposition
parameters beint
oruint
. Latter is preferred since negatives not permitted regardless. Decision: int is an idiomatic input/output type in C#.Log(0)
is mathematically undefined. Should it return0
or-1
? Decision: Returns 0LeadingZeros
might need a different algorithm on BE/LE platforms. Decision: Endianess only matters if we are reinterpreting integers.Sample call sites
The following samples are taken from the linked units, from the method
BitOps_Samples
. The code chooses values that are easy to eyeball for correctness. The real units cover many more boundaries & conditions.Updates
InsertBit
accepted a bool that determined whether it set or cleared the bit in question. Such methods have now been refactored in two,InsertBit
andClearBit
.FlipBit
becameComplementBit
).int
instead ofbyte
. All POC units pass.WriteBit(value, offset, bool on)
overloads that conditionally set/clear the specified bit.ExtractByte
and friendsEvaluate(bool)
(used internally, so may as well expose it)Span<T>
overloads per @tannergooding advice. Will maybe submit in separate proposalTrailingOnes
andLeadingOnes
into to do later proposal in commentsLog2
RotateByte
, etc into to do later proposal in comments