dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.04k stars 4.68k forks source link

Proposal: Add a BitManipulation class #18876

Closed mburbea closed 4 years ago

mburbea commented 7 years ago

I propose adding a new class to CoreFx, which will contain many common bit manipulation techniques. Part of having a class like this is that the Jit can target these methods class for intrinsics, similar to what was done for Vector<T>.
There is still some open discussion as to what the shape of this API should be, and even the class name.

Class Name

There are several class names being thrown around and none are particularly winning everyone over.

Two Classes?

@benaadams, correctly notes that there seems to be two different APIs here. A low level view allowing you to manipulate a numeric type to extract or inject a bit (like a bit vector), byte, short, or int value. These methods are the equivalent of the following but safer (and possibly faster)

public static unsafe int ExtractInt(ulong u,int pos)=>((int*)&u)[pos];
public static unsafe ulong InjectInt(ulong u, int pos, int toSet) {
    ((int*)&u)[pos] = toSet;
     return u;
}

And another set of utility exposing methods for treating an integer register type like a unit of data allowing you to manipulate it or introspect it.

Does it make sense to keep these APIs in one class?

Method Names

Another point of contention. What should we call these methods? For the "view" members, there is some dislike of the naming convention of Get/Set even though there is prior art like BitVector32 & SqlDataRecord. Everyone seems to like Read/Write more, but while Read is fine. Write isn't neccessarily the operation being done. I'm still looking for some verbiage to note that this really takes a value, modifies it a bit (no pun intended), and spit out a new one.

PopCount/HammingWeight/ CountSetBits - We can't decide on this name. I personally like PopCount as it is a well known name for the algorithm. However, for someone who does not CPU intrinsics or bit tricks this name might mean nothing to you. the .net api is split where common or well known algorithms are simply called that (e.g. Rijndael) and sometimes descriptive for new users. I think that this class in general is fairly low-level so even a novice should be expected to do a quick google search in the subject area.

And even naming the methods as actions (e.g. CountTrailingZeroes) or properties (e.g. TrailingZeroesCount).

Hardware Acceleration / Detection

@benaadams, suggested adding a simple flag to determine if the class has hardware acceleration. I personally suggest going a step further and adding an enum to describe which methods are accelerated (not in the PoC). Unfortunately, the enum based approach does raise the question if the jit could do branch elimination on a comparison like the following if it could replace AcceleratedOperations as a runtime constant. (unknown)

// 
if (Bits.AcceleratedOperations == BitOps.PopCount){
   // use popcount
}
else{
  // work around it?
}

I admittedly question what you would do differently if it isn't accelerated. This isn't like Vector<T> where you might switch to a different approach and use ulong like smaller vectors. The methods should be pretty good solutions to these problems and outside of switching to some native code I don't see us doing better. Methods that could be accelerated (in AMD64 at least):: PopCount => POPCNT RotateRight => ROR (already accelerated!) RotateLeft => ROL (already accelerated!) CountLeadingZeroes => LZCNT (in ABM compliant hardware) or (BSR followed by xor 63) CountTrailingZeroes => TZCNT / BSF ReadBit => BT WriteBit => BTS/BTC (maybe?) ReadByte/ReadInt16/ReadInt32 => BEXTR (possibly)

Updated spec:

public static class Bits{
       public static int PopCount(ulong value);
       public static int PopCount(long value);
       public static int PopCount(uint value);
       public static int PopCount(int value);

       public static ulong RotateRight(ulong value, int offset);
       public static uint  RotateRight(uint value, int offset);

       public static ulong RotateLeft(ulong value, int offset);
       public static uint  RotateLeft(uint value, int offset);

       public static int CountLeadingZeros(ulong value);
       public static int CountLeadingZeros(long value);
       public static int CountLeadingZeros(uint value);
       public static int CountLeadingZeros(int value);

       public static int CountTrailingZeroes(ulong value);
       public static int CountTrailingZeroes(long value);
       public static int CountTrailingZeroes(uint value);
       public static int CountTrailingZeroes(int value);

       public static  bool ReadBit(ulong value, int offset);
       public static  bool ReadBit(long value, int offset);
       public static  bool ReadBit(uint value, int offset);
       public static  bool ReadBit(int value, int offset);

       public static ulong WriteBit(ulong value, int offset, bool toSet);
       public static long WriteBit(long value, int offset, bool toSet);
       public static uint WriteBit(uint value, int offset, bool toSet);
       public static int WriteBit(int value, int offset, bool toSet);

       public static byte ReadByte(ulong value, int offset);
       public static byte ReadByte(long value, int offset);
       public static byte ReadByte(uint value, int offset);
       public static byte ReadByte(int value, int offset);

       public static short ReadInt16(ulong value, int offset);
       public static short ReadInt16(long value, int offset);
       public static short ReadInt16(uint value, int offset);
       public static short ReadInt16(int value, int offset);

       public static int ReadInt32(ulong value,  int offset);
       public static int ReadInt32(long value,  int offset);

       public static ulong WriteByte(ulong value, int offset, byte toSet);
       public static long WriteByte(long value, int offset, byte toSet);
       public static uint WriteByte(uint value, int offset, byte toSet);
       public static int WriteByte(int value, int offset, byte toSet);

       public static ulong WriteInt16(ulong value, int offset, short toSet);
       public static long WriteInt16(long value, int offset, short toSet);
       public static uint WriteInt16(uint value, int offset, short toSet);
       public static int WriteInt16(int value, int offset, short toSet);

       public static ulong WriteInt32(ulong value, int position, int toSet);
       public static long WriteInt32(long value, int position, int toSet);
}

Edit: POC class https://gist.github.com/mburbea/c9a71ac1b1a25762c38c9fee7de0ddc2

More updates! Removed signed rotate operators.

svick commented 7 years ago
public int IsBitSet(long value, int position);

Should this return bool instead of int?

mellinoe commented 7 years ago

@nguerrera I recall you had mentioned some internal helpers that Roslyn was using that were related to this sort of functionality. Any input here?

nguerrera commented 7 years ago

CountBits is the main one. Search for BitArithmetic in Roslyn and corefx. I've long wanted a nice class to hide the scariness of https://graphics.stanford.edu/~seander/bithacks.html

mburbea commented 7 years ago

Yes that should return bool. If this seems like something that would be liked I'd happily submit a pr for little endian architecture.

Tornhoof commented 7 years ago

This issue, at least partially, overlaps with https://github.com/dotnet/coreclr/issues/6906

SunnyWar commented 7 years ago

Shouldn't the methods be declared as static? And these have variable name conflicts and a missing comma on SetInt16.

public void SetBit(long value, int position, bool value); public void SetByte(long value, int position, byte value); public void SetInt16(long value, int position. short value); public void SetInt32(long value, int position, int value);

corrected public static void SetBit(long value, int position, bool to); public static void SetByte(long value, int position, byte to); public static void SetInt16(long value, int position, short to); public static void SetInt32(long value, int position, int to);

All and all I like the idea of a central location for common bit manipulating functions.

mburbea commented 7 years ago

Yes, I fixed the error thanks @SunnyWar . @Tornhoof ,Yes it does, for some of the intrinsics.

SunnyWar commented 7 years ago

@mburbea All the Set methods need a return value or the value as an out param.

jamesqo commented 7 years ago

Very, very much +1 for this idea. Could contain other useful methods such as

public int CountBits(int value);
public int RoundUpToPowerOfTwo(int value);
public int IsPowerOfTwo(int value);

to prevent people from having their own hand-rolled, inefficient implementations.

IMO, we may also want to name it something like BitUtility instead of BitManipulation. The latter feels a bit verbose, and is more prone to typos.

mburbea commented 7 years ago

Isn't CountBits just pop count?

jamesqo commented 7 years ago

@mburbea Yes, indeed. I believe the JIT does not currently generate that instruction yet though (related issue: dotnet/runtime#14781) because the code pattern is too hard to recognize, so this might be a good entry point to expose it.

mburbea commented 7 years ago

Yeah, @jamesqo, all I was saying that the method was covered. I wouldn't be opposed to calling it something else, I went with the name because it was suggested by @mellinoe . I think these are good additions and once we have some of these intrinsics could be implemented quite easily.

public static long IsPowerOfTwo(long value)=> x == (x&-x) && x!=0;
public static long RoundUpToPowerOfTwo(long value)=> 1 << (64 -CountLeadingZeroes(value - 1));
jamesqo commented 7 years ago

@mburbea

Yeah, @jamesqo, all I was saying that the method was covered. I wouldn't be opposed to calling it something else, I went with the name because it was suggested by @mellinoe .

Ah, ok, I see.

Question-- do you think it's necessary to have methods like IsBitSet, GetByte, SetByte, etc? I mean most people with basic knowledge of bit manipulation would know they could just write (value & (1 << position)) != 0 instead of IsBitSet, (value & 0xFF00) >> 8 instead of GetByte(value, 1), etc. I think the most useful methods in this class would be the ones that aren't very obvious to implement.

Also, there is somewhat of an ambiguity-- people don't know if the position parameter starts from the msb or lsb.

svick commented 7 years ago

@jamesqo I think it's easier to read a single method call with a descriptive name than an expression with three operators and two magic constants (even if they are just 0 and 1). Especially when it comes to negation, I think it's relatively easy to confuse (value & (1 << position)) != 0 and (value & (1 << position)) == 0, but you aren't likely to confuse IsBitSet(value, position) with !IsBitSet(value, position).

You are free to keep writing it the way you're used to, but I think having those methods in the base library as an option would be useful.

mburbea commented 7 years ago

I'm not opposed to removing them if they're considered a hindrance to api acceptance, but I like having convience methods. None of those methods are hard to write, but like most C# developers most of my code is higher level, so I rarely twiddle bits. When I need to, I often end up googling these things, so centralizing them made sense to me.

redknightlois commented 7 years ago

@mburbea @jamesqo I write a lot of bit twiddling code and still would appreciate to be able to get a proper API I could reason about instead of having my entire codebase filled with magic constants, shifts and binary operations. Always as long as the actual code generated is as efficient (guaranteed) as the code generated using them.

jamesqo commented 7 years ago

@redknightlois Hmm, ok. I guess popular vote wins since everyone seems to want those methods :smile: There is still the issue of ambiguity whether position starts from the msb or lsb, though.

mburbea commented 7 years ago

That might be a question of endianness (as in big endian long, the msb is where the 0th bit is stored and the lsb is where the 63rd bit is stored, but in little endian it's reversed),

Intuitively, I feel its implied that its in sequential order for your hardware. e.g. SetBit(0, 0, true) should return the number 1 regardless of endian order.

benaadams commented 7 years ago

@mellinoe @CarolEidt https://arxiv.org/abs/1611.07612 faster popcnt than popcnt using AVX2 Apache License 2.0 source https://github.com/CountOnes/hamming_weight

mburbea commented 7 years ago

@mellinoe , @CarolEidt What else needs to be resolved for this API proposal to move forward?

mellinoe commented 7 years ago

@mburbea I think that this is in a good state and should be ready for our meeting next week.

For our reviewing, I think having a real implementation for all of these methods would actually be very helpful. A big part of the benefit of these API's is that they will hide the "scariness" of complicated bit manipulation from the user. It would be good if we could see just how "scary" the code is on a case-by-case basis. For example, it's not clear that the code for SetBit (above) is scary enough to justify inclusion, but I would bet that the code for PopulationCount definitely is. Having the code ready to look at would be very helpful for our review.

mellinoe commented 7 years ago

Additionally, can you update the original post with your finalized proposal, including any extra methods that were suggested that you believe are valuable?

tannergooding commented 7 years ago

@mellinoe, would these methods have intrinsic support (many of these functions have CPU level instructions that are useable/preferred in some scenarios).

mellinoe commented 7 years ago

@mellinoe, would these methods have intrinsic support (many of these functions have CPU level instructions that are useable/preferred in some scenarios).

It's something to consider as an optimization, but I don't think it's anything we need to block the proposal or an implementation on.

jamesqo commented 7 years ago

Are we still considering different names for these methods? IMO, in their current form they are kinda verbose-- for example, BitManipulation.PopulationCount is pretty long to type, and its meaning is not immediately obvious to non-assembly programmers. (It may not even be implemented with popcnt.)

I think we may want to rename some of the methods, e.g.:

We can also consider another name for the class, such as BitTwiddler. Compare the number of characters:

BitTwiddler.RotateLeft(i, 5);
BitManipulation.CircularShiftLeft(i, 5);

edit: In addition, GetInt32 / SetInt32 and family sound a bit too Java-like for me. (Set also implies that something is being mutated, rather than a new value being returned.) Not sure what would be a good rename, though.

mburbea commented 7 years ago

IMO, C# has always been a bit more on the verbose side. I felt that was intentional as code is more often read than written. The longer names have always been explained away with the fact that most people are writing C# using an IDE which offers Intelisense.

SqlDataRecord has both a GetInt32 and SetInt32, so the names are not exactly unique to Java. As for the mutation question, would you prefer that the method takes a ref parameter and modify their input? Edit: Any suggestion for namespace? System.Numerics?

redknightlois commented 7 years ago

We use Bits and Sparrow.Binary as a namespace to put all that stuff and in practice it reads quite well. https://github.com/ravendb/ravendb/blob/v4.0/src/Sparrow/Binary/Bits.cs (Sparrow is our fast library name, call that System.Binary here instead).

PopCount is a very well known name, the same with RotateLeft32, RotateRight32, RotateLeft64, RotateRight64

svick commented 7 years ago

I wonder if this is one of the types that is likely going to be used with using static. If that's the case, the length of the class name does not matter much, since it's going to appear only once per file.

jamesqo commented 7 years ago

@mburbea

IMO, C# has always been a bit more on the verbose side. I felt that was intentional as code is more often read than written.

Verbose does not always equal more readable-- the meaning that's conveyed is what's important. CircularShiftLeft does not tell the reader more information than RotateLeft.

I am okay with renaming the class to maybe something like BitUtility.

I'll rename CircularShift=> Rotate.

:smile: :tada: We should consider both BitUtility (in tandem with WebUtility) and BitTwiddling for the name.

PopulationCount: hows about PopCount? It is a well known name for this operation, and is shorter than CountBits.

IMO, I think people who don't have a background in assembly deciphering what PopCount means without Googling it. CountBits is longer, but only by 1 character. I think the API reviewers should consider both.

SqlDataRecord has both a GetInt32 and SetInt32, so the names are not exactly unique to Java.

Yes, but the point is Get / Set kind of goes against the grain-- .NET has properties, so get/set-naming isn't very common in code. The example you pointed out is an object which mutates state, which is different from the scenario we're dealing with currently.

As for the mutation question, would you prefer that the method takes a ref parameter and modify their input?

ref parameters are not very flexible and do not work well when you want to do multiple operations on the int. I am OK with the returning-a-new-value-each-time scheme, I'm just trying to think of a better name.

Any suggestion for namespace? System.Numerics?

BitConverter is already in System, and it would be inconvenient if someone had to add another using every time they wanted to do bit twiddling. I think it should be System.

jamesqo commented 7 years ago

@svick

I wonder if this is one of the types that is likely going to be used with using static. If that's the case, the length of the class name does not matter much, since it's going to appear only once per file.

I think we should make it so people don't have to add using static in the first place. I'm not sure what you mean by "one of the types that is likely going to be used with using static;" I rarely use it because it can be very ambiguous to the reader.

jamesqo commented 7 years ago

Maybe some other people would be interested in this discussion? cc @justinvp, @nietras, @JonHanna

mburbea commented 7 years ago

Here is my quick and dirty implementation of the class, I can set up a PR later if this looks good. I'll update the speclet later today. https://gist.github.com/mburbea/c9a71ac1b1a25762c38c9fee7de0ddc2

As you can see pretty much everything is full of magic constants and shift operators. Suprisingly, the Get/Set were actually some of the hardest to write code (curse you sign extension!). Outside of GetBit/SetBit I couldn't really think of a good reason to use the other operations, so if those get axed I wouldn't mind. Questions: Should we do input validation? I think this could be a drag on the performance of these methods. The whole point of this is you know what you're doing and if you ask an invalid question, you get an invalid result. Big endian? I don't know we should handle it, as certainly some of these methods would fail in big endian environments. There might be some implementation agnostic approach but they might be slower and might lead someone writing low level code to avoid using this class. (defeating it's purpose).

benaadams commented 7 years ago

Generally in .NET is ReadX and WriteX isn't it? So ReadInt32 rather than GetInt32

tannergooding commented 7 years ago

I feel as though input validation on these bits defeats the purpose. If validation were to be done, I would personally want them to wrap a version that did no validation (so you would have FastOperation and ValidatedOperation, where ValidatedOperation does the check and then calls FastOperation and both APIs were publicly available).

I would also guess that big endian does not need to be explicitly supported here. If a big-endian platform was ever supported, then we have other APIs that would need big-endian versions, such as BitConverter.Int64BitsToDouble, and it was explicitly decided that the bridge would be crossed when/if it came to that.

I think that, long term, all of these APIs would be good candidates for intrinsic support, as most of them compile down to a single CPU instruction on most modern computers.

I do not personally like BitTwiddler (it is not necessarily understandable by someone who speaks/reads English as a secondary language).

As for PopCount versus CountBits, I would vote for PopCount as it is used almost universally elsewhere (CPU instructions, Compiler Intrinsics, etc). It may be worth noting that it seems BitCount is used in the few places where PopCount is not.

Anyone doing bit manipulation should likely be reading the summary of the methods they are using or would be familiar with the other versions of PopCount available in other languages.

jamesqo commented 7 years ago

@benaadams

Generally in .NET is ReadX and WriteX isn't it? So ReadInt32 rather than GetInt32

I like your idea. Bit-twiddling is all about low-level programming, so "read" / "write" jives with me more here than "get" / "set". In addition, they sound smoother.

@mburbea

Should we do input validation?

I don't think we should. This should be a low-level API that has minimal overhead compared to writing manual bit twiddling code.

Big endian? I don't know we should handle it, as certainly some of these methods would fail in big endian environments.

Would they? I don't see any methods where a pointer is being cast to another pointer of a narrower/wider type. How would endianness be an issue here?

mburbea commented 7 years ago

@jamesqo, CountTrailingZeroes/CountLeadingZeroes relies on bit-isolation tricks that only works correctly in a little endian environment. In a big endian environment it does not isolate the correct bit. See this comment from Kestrel where a simmilar technique was used. https://github.com/aspnet/KestrelHttpServer/pull/1138#issuecomment-255878891

nietras commented 7 years ago

@jamesqo thanks for pinging me.

@mburbea This is definitely interesting and having this functionality at hand would be a great addition to .NET šŸ‘

This got a bit long... just a warning ;)

Naming

Class Name

First, my thoughts on the class name below. Some are nitpicking for sure.

Class Name Notes
BitManipulation Not fond of using "manipulation" in the name, this implies manipulating or in other words changing or modifying something. Many methods do not manipulate anything, instead they inspect/measure bits in a value.
BitTwiddler Twiddling is analoguous to manipulation, and is not a good term since it is rather esoteric (less known perhaps?)
BitUtility I think this has precedence, but personally not that happy about it...
BitHelpers Precedence in RuntimeHelpers, HashHelpers, but I'm not that fond of this, what code isn't "helpful"? At least all code should be.
BitOperations A more general name than manipulation, its a bit shorter and still accurate, operations on bits.
BitOps Shorter, still readable, has a bit of precedence in e.g. OpCodes, RuntimeOps... I think
Unsafe Thought of this just before commenting, disregarding the technical issues here, these could also be added to the Unsafe class directly, I don't like it since it is probably better to group them in a separate type, but keeping the options open.
?? Hope I didn't miss any?

Any of these could and should perhaps be prefixed with Unsafe, as I agree these should have no validation and be as close to metal as possible. In fact, the documentation of these should probably say that if any input parameter is outside an expected range the result will be implementation, machine dependent. E.g. using position 65 for a 64-bit long can do pretty much anything, return a result, crash the entire machine etc.

Method Naming

I definitely think methods should use Read/Write instead of Get/Set, as in the Unsafe class. And I think all methods should start with a verb such as Read, Write, Is, Shift, Rotate, Count, Flip, Toggle, Round etc.

Regarding PopulationCount, I agree this is a well known intrincic, but is the target audience people who know intrinsics? Or any .NET developer in need of a bit operation? I think CountBits is clear in intent, and I am sure a person who knows intrinsics will think this does the same, but perhaps wonder if it is as efficient... so docs should be clear about it.

Grand Scheme

My biggest beef with this is that I have a hope of all intrinsics being handled in a unified way (say System.Intrinsics) that also ensures such methods will be available for larger types (think vector registers), such as 128-bit, 256-bit, 512-bit etc.

With Intrinsics being a fixed size compliment to the variable sized Vector<T> (rant: an annoying and very overloaded name seen from a geometry, mathematics perspective and given C++ vector<T>). We have discussed this many times but unfortunately this is probably a very long term scheme, so adding these bit methods (in System.Runtime.CompilerServices namespace perhaps?) is a more practical step in the right direction at least.

It would be good to consider how larger types 128-bit, 256-bit etc. could be supported?

My Current Suggestions

Call this UnsafeBitOps place it in System.Runtime.CompilerServices. Although, UnsafeBitOperations works for me too.

@mburbea could your update your proposal in beginning with the latest proposal?

benaadams commented 7 years ago

One reason for class splits to consider is the addition of a IsHardwareAccelerated flag and what granularity it works over (and needs to work over)

mburbea commented 7 years ago

@nietras, done sorry was a bit slacking. Included my POC.

Naming

Class Name

I vote for BitUtility, Bits or BitManipulation. I definitely do not like the idea of associating unsafe with this class. Or even declaring the class unsafe. I and other developers, generally avoid unsafe code when I can, and will write slower (but safer) code to avoid it. The whole point of this class is to encourage people to use these methods, which are effecient and well tested version of algorithms instead of naive approaches

// Probably performs horribly, but def works
int PopCount(long u) => Convert.ToString(u, 2).Count(x => x == '1');

With an alterior motive, that in a future version of the runtime, some of these methods can be turned into intrinsics. Furthermore, All of these operations are safe, they don't deal with memory, and using these methods incorrectly will not lead to access violations, or really any exception at all. I think the better concept to associate them with is unchecked.

Method Names

PopCount: I really have never heard of CountBits and naively it doesn't sound like its doing the right thing. A bit can be 0 or 1, why does 1 make something a bit, but 0 does not? I've seen this method called CountOnes and for a naive reader that certainly sounds like the right thing. (How many ones are in this thing?), but at the same point if you need this operation, you'd either code it yourself (hopefully doing a better job than my naive approach) or you'd google it and be told that it's called popcount/hemming weight/horrizontal addition from stack overflow. Verbs for methods? Eh,, I don't know. If that's the argument we could call it GetPopCount. But not every method has to be a verb. Certainly well known algorithms are just called by their names. Read/Write, I don't know. As I said, Get/Set is not exactly completely unfounded in C#.

Extending to Vector<T>? Yes! I'd love that. But the literature on how to write such methods is pretty slim as pretty much those methods are intrinsics.

CarolEidt commented 7 years ago

One reason for class splits to consider is the addition of a IsHardwareAccelerated flag and what granularity it works over (and needs to work over)

It would be great to find a general mechanism that we could use to query, on a method-level basis, whether something is supported as an intrinsic by the JIT. We've had some discussions about this in the past, but I don't think there was a concrete proposal that could be implemented efficiently.

jamesqo commented 7 years ago

@mburbea Regarding your comment earlier

CountTrailingZeroes/CountLeadingZeroes relies on bit-isolation tricks that only works correctly in a little endian environment. In a big endian environment it does not isolate the correct bit. See this comment from Kestrel where a simmilar technique was used.

I didn't know what a De Bruijn sequence was earlier, so I had to spend an hour making sure I had all my facts straight before I replied to your post :smile: However, I still do not think these will have any dependence on endianness.

For example, in CountTrailingZeroes, ((x ^ (x - 1)) * Debruijn64) >> 58 should always have the same result for a given x. Unless I'm misunderstanding you, it would be completely insane if arithmetic/bitshifting operations returned different results for different architectures. The comment you linked to has this code snippet

  uint8_t valueAtIndexZero = ((uint8_t*)&longValue)[0];
  uint8_t valueAtIndexTwo = ((uint8_t*)&longValue)[2];
  uint8_t valueAtIndexFive = ((uint8_t*)&longValue)[5];

which is violating strict aliasing by casting the uint64_t* to a pointer of a smaller type. I don't think it is possible to tell the difference between big/little endian otherwise. (@benaadams can chime in on this)


Regarding naming,

I vote for BitUtility, Bits or BitManipulation.

Bits sounds like a good name too, even if it is a little generic. Most bitshifting code is very terse (not a lot of words) so less characters is valuable.

Read/Write, I don't know. As I said, Get/Set is not exactly completely unfounded in C#.

Whichever you decide, you may want to change IsBitSet to ReadBit / GetBit to make it consistent with the other methods. Also would prevent weirdness like

int value = BitUtility.SetBit(val, 0, false);
BitUtility.IsBitSet(value, 0); // IsBitSet returns false, after we called SetBit
nietras commented 7 years ago

CountBits and naively it doesn't sound like its doing the right thing

@mburbea you are right! CountBits is not clear. CountOnes better, I can live with PopulationCount, although...

whole point of this class is to encourage people to use these methods, which are effecient and well tested version of algorithms

This sounds more like the target is to be safe, correct and developer friendly, rather than optimal and efficient. Perhaps both can be obtained, at least for some methods for sure, but some look like they require input validation if they are to be for the "average" .NET developer. And in that case, I would think PopulationCount to be less developer friendly for people that have never heard of popcnt.

I was hoping for the priority being "optimal and efficient" rather than "safe". If "safe" is not the case it seems in .NET one has to say either "Unsafe" or "Dangerous" to mark the class or method as being well without hands holding. ;) Not ideal, but at least specific.

Why "optimal and efficient", because that is harder to get to than safe... having them as close to "intrinsics" as possible would be best first in my view. :)

benaadams commented 7 years ago

Population count is a bit weird in a high level language as "population" likely means something different to the user; there is a domain change in language with a CPU designer.

Also it is a property of the value rather than an action; so shouldn't start with a verb?

Perhaps more SetBitCount and its opposite UnsetBitCount and maybe an extension method?

Also since popcnt and bitscan will either be hardware or not together; many be group together?

e.g.

public static class BitExtensions
{
       public bool IsHardwareAccelerated {get;}

       public static int SetBitCount(this ulong value);
       public static int SetBitCount(this long value);
       public static int SetBitCount(this uint value);
       public static int SetBitCount(this int value);

       public static int UnsetBitCount(this ulong value);
       public static int UnsetBitCount(this long value);
       public static int UnsetBitCount(this uint value);
       public static int UnsetBitCount(this int value);

       public static int LeadingZeroCount(this ulong value);
       public static int LeadingZeroCount(this long value);
       public static int LeadingZeroCount(this uint value);
       public static int LeadingZeroCount(this int value);

       public static int TrailingZeroCount(this ulong value);
       public static int TrailingZeroCount(this long value);
       public static int TrailingZeroCount(this uint value);
       public static int TrailingZeroCount(this int value);
}

Usage

var hammingDiff = (val1 ^ val2).SetBitCount();
mburbea commented 7 years ago

I've updated the proposal voicing my vote for Bits and renaming Get/Set to Read/Write since it seems more well liked. I also changed from IsBitSet=>ReadBit to be more consistent with simmilar methods.

@jamesqo, you might be right. I've no way to test this code in a big endian environment so it may just work.

@nietras, I somewhat get the point, but unsafe is a rather large disincentive. In a CoreCLR project, Visual Studio doesn't even have a friendly way for you to enable unsafe code, you have to update the project.json manually. If I understood CoreCLR#1830, it seems that rotateRight/RotateLeft can be made input safe (and I think I did so in my code), so really the only methods that cannot be "safe" for any input are the bit/byte/short/int extractions/substitution methods.

@benaadams, I like the idea of a IsHardwareAccelerated method, but some architecture may only have some method accelerated, or poorly accelerated. POPCNT & LZCNT showed up in ABM but TZCNT came latter in BMI1. Before that TZCNT would probably become a bsf followed by xor 63, and would most likely be undefined on 0. 64-bit bsf is known to have been pretty slow on older hardware.

benaadams commented 7 years ago

@mburbea its bigger than CPU type; its also version of Jit - i.e. there is nothing particularly in the api that means it couldn't happily be .NETStandard 1.0 - however it will only be hardware accelerated on a Jit that doesn't currently exist (so at least 4.6.3+ and Core 1.2.0+ though maybe version after...)

benaadams commented 7 years ago

Also to match the action name the write functions should be

public static void WriteInt16(ref ulong value, int offset, short toSet);

Otherwise you aren't writing; you are returning a changed value

mellinoe commented 7 years ago
       public static int SetBitCount(this ulong value);
       public static int UnsetBitCount(this ulong value);

I don't like these names because they imply that something is being modified, since they start with Set and Unset. I'm also not a fan of using extension methods like that; I think these should all be regular static methods.

benaadams commented 7 years ago

they imply that something is being modified, since they start with Set

Ah yes, the ambiguity of English... šŸ˜„ But it does highlight an angle I hadn't thought of as to why PopulationCount is bad; as it would be quite a weird name for a non English speaker - is a name by convention rather than clarity.

public static int BitsSetCount(ulong value);
public static int BitsUnsetCount(ulong value);

Example usage

var hammingDiff = Bits.BitsSetCount(val1 ^ val2);

Double Bits looks a bit weird though

benaadams commented 7 years ago

Two classes?

public static class Bits
{
    public bool IsHardwareAccelerated { get; }

    public static int CountSet(ulong value);
    public static int CountSet(long value);
    public static int CountSet(uint value);
    public static int CountSet(int value);

    public static int CountUnset(ulong value);
    public static int CountUnset(long value);
    public static int CountUnset(uint value);
    public static int CountUnset(int value);

    public static int CountLeadingZeros(ulong value);
    public static int CountLeadingZeros(long value);
    public static int CountLeadingZeros(uint value);
    public static int CountLeadingZeros(int value);

    public static int CountTrailingZeroes(ulong value);
    public static int CountTrailingZeroes(long value);
    public static int CountTrailingZeroes(uint value);
    public static int CountTrailingZeroes(int value);

    public static ulong RotateRight(ulong value, int offset);
    public static uint RotateRight(uint value, int offset);
    public static ulong RotateLeft(ulong value, int offset);
    public static uint RotateLeft(uint value, int offset);
}

Usage

var hammingDiff = Bits.CountSet(val1 ^ val2);
public static class BitView
{
    public static bool ReadBit(ulong value, int offset);
    public static bool ReadBit(long value, int offset);
    public static bool ReadBit(uint value, int offset);
    public static bool ReadBit(int value, int offset);

    public static void WriteBit(ref ulong value, int offset, bool toSet);
    public static void WriteBit(ref long value, int offset, bool toSet);
    public static void WriteBit(ref uint value, int offset, bool toSet);
    public static void WriteBit(ref int value, int offset, bool toSet);

    public static byte ReadByte(ulong value, int offset);
    public static byte ReadByte(long value, int offset);
    public static byte ReadByte(uint value, int offset);
    public static byte ReadByte(int value, int offset);

    public static short ReadInt16(ulong value, int offset);
    public static short ReadInt16(long value, int offset);
    public static short ReadInt16(uint value, int offset);
    public static short ReadInt16(int value, int offset);

    public static int ReadInt32(ulong value, int offset);
    public static int ReadInt32(long value, int offset);

    public static void WriteByte(ref ulong value, int offset, byte toSet);
    public static void WriteByte(ref long value, int offset, byte toSet);
    public static void WriteByte(ref uint value, int offset, byte toSet);
    public static void WriteByte(ref int value, int offset, byte toSet);

    public static void WriteInt16(ref ulong value, int offset, short toSet);
    public static void WriteInt16(ref long value, int offset, short toSet);
    public static void WriteInt16(ref uint value, int offset, short toSet);
    public static void WriteInt16(ref int value, int offset, short toSet);

    public static void WriteInt32(ref ulong value, int position, int toSet);
    public static void WriteInt32(ref long value, int position, int toSet);
}

Usage

BitView.WriteInt32(ref value, 2, 7);
nietras commented 7 years ago

"Unsafe" doesn't mean you have to enable unsafe for the compiler. It is nothing more than a name prefix to signify to a developer that the code you are calling can do unexpected things if you are not aware what you are doing. Just as the "Unsafe" class that can be used in any normal safe context, and like e. g. `DangerousGetHandle' or similar that can be used from normal safe context.

On Dec 9, 2016 7:58 PM, "Ben Adams" notifications@github.com wrote:

Two classes?

public static class Bits { public bool IsHardwareAccelerated { get; }

public static int CountSet(ulong value);
public static int CountSet(long value);
public static int CountSet(uint value);
public static int CountSet(int value);

public static int CountUnset(ulong value);
public static int CountUnset(long value);
public static int CountUnset(uint value);
public static int CountUnset(int value);

public static int CountLeadingZeros(ulong value);
public static int CountLeadingZeros(long value);
public static int CountLeadingZeros(uint value);
public static int CountLeadingZeros(int value);

public static int CountTrailingZeroes(ulong value);
public static int CountTrailingZeroes(long value);
public static int CountTrailingZeroes(uint value);
public static int CountTrailingZeroes(int value);

public static ulong RotateRight(ulong value, int offset);
public static uint RotateRight(uint value, int offset);
public static ulong RotateLeft(ulong value, int offset);
public static uint RotateLeft(uint value, int offset);

}

Usage

var hammingDiff = Bits.CountSet(val1 ^ val2);

public static class BitView { public static bool ReadBit(ulong value, int offset); public static bool ReadBit(long value, int offset); public static bool ReadBit(uint value, int offset); public static bool ReadBit(int value, int offset);

public static void WriteBit(ref ulong value, int offset, bool toSet);
public static void WriteBit(ref long value, int offset, bool toSet);
public static void WriteBit(ref uint value, int offset, bool toSet);
public static void WriteBit(ref int value, int offset, bool toSet);

public static byte ReadByte(ulong value, int offset);
public static byte ReadByte(long value, int offset);
public static byte ReadByte(uint value, int offset);
public static byte ReadByte(int value, int offset);

public static short ReadInt16(ulong value, int offset);
public static short ReadInt16(long value, int offset);
public static short ReadInt16(uint value, int offset);
public static short ReadInt16(int value, int offset);

public static int ReadInt32(ulong value, int offset);
public static int ReadInt32(long value, int offset);

public static void WriteByte(ref ulong value, int offset, byte toSet);
public static void WriteByte(ref long value, int offset, byte toSet);
public static void WriteByte(ref uint value, int offset, byte toSet);
public static void WriteByte(ref int value, int offset, byte toSet);

public static void WriteInt16(ref ulong value, int offset, short toSet);
public static void WriteInt16(ref long value, int offset, short toSet);
public static void WriteInt16(ref uint value, int offset, short toSet);
public static void WriteInt16(ref int value, int offset, short toSet);

public static void WriteInt32(ref ulong value, int position, int toSet);
public static void WriteInt32(ref long value, int position, int toSet);

}

Usage

BitView.WriteInt32(ref value, 2, 7);

ā€” You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dotnet/corefx/issues/12425#issuecomment-266092754, or mute the thread https://github.com/notifications/unsubscribe-auth/AKTG73UGfQnLzGbf6B9sgYGnXrmm6Xn4ks5rGaTXgaJpZM4KQTp- .