dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.19k stars 4.72k forks source link

API Proposal: BitArray.CopyTo(int value) / BitArray.From(int value) #30821

Open Symbai opened 5 years ago

Symbai commented 5 years ago

At the moment there is no way to create a System.Collections.BitArray from a single integer and no way to return the bits to a single integer. I don't know the exactly reason why the API is missing but considering its possible to achieve with a 1 length integer array it wouldn't be too difficult to add such an API.

namespace System.Collections
{
    public sealed class  BitArray : ICollection, IEnumerable, ICloneable
    {
        public void CopyTo(Span<byte>);
        public BitArray(ReadOnlySpan<byte> bytes)
   }
}

Example::

static void ChangeBit(ref int value,  int position, bool newBit)
{
    Span<int> spanInt = MemoryMarshal.CreateSpan(ref value, 1);
    Span<byte> spanBytes = MemoryMarshal.AsBytes(spanInt);
    BitArray bits = new(spanBytes);
    bits[position] = newBit;
    bits.CopyTo(value);
}
static int ReplaceBitRange(int existingInteger, int newInteger, int position, int length)
{
    Span<byte> existingSpanBytes = MemoryMarshal.AsBytes(MemoryMarshal.CreateSpan(ref existingInteger, 1));
    Span<byte> newSpanBytes = MemoryMarshal.AsBytes(MemoryMarshal.CreateSpan(ref newInteger, 1));
    BitArray oldBits = new(existingSpanBytes);
    BitArray newBits = new(newSpanBytes);
    int counter = 0;
    for (int i = position; i < position + length; i++)
    {
        oldBits[i] = newBits[counter++];
    }
    int result;
    Span<byte> resultSpanBytes = MemoryMarshal.AsBytes(MemoryMarshal.CreateSpan(ref result, 1));
    oldBits.CopyTo(result);
    return result;
}

Without these APIs we would have to create arrays when initializing the bit array and when returning, which adds more performance cost in scenarios where it matters. While changing bits directly from and to an integer is better.

If getting approved perhaps this can be considered to be not only for integer but also for byte.

Clockwork-Muse commented 5 years ago

BitArray(int length) is almost certainly due to <SomeCollectionType>(int length).

Without these APIs we would have to create arrays when initializing the bit array and when returning, which adds more performance cost in scenarios where it matters. While changing bits directly from and to an integer is better.

This implies you have a significant number of pre-initialized integers (or a similar number of destinations), which strikes me as.... unusual. If you don't have a pre-initialized value, you don't need the value consumption, of course.

AlexejLiebenthal commented 5 years ago

I just thought about your problem. And as BitArray is implemented as an integer array field, you need to create the array anyway.

If I understand you correctly, I think, I struggled sometime ago with a similar problem like you. If you like to have some O(1) bit manipulation abstraction on int, I hacked something for you. Especially the ReplaceBitRange method should perform better (O(length) vs. O(1)).

public struct IntBit
{
    private readonly int value;

    public IntBit(int value) => this.value = value;

    public IntBit ChangeBit(int pos, bool bit)
    {
        if (pos < 0 || pos > 31) throw new ArgumentOutOfRangeException(nameof(pos));

        return ReplaceBitRange(bit ? 1 << pos : 0, pos, 1);
    }

    public IntBit ReplaceBitRange(int bits, int start, int length)
    {
        if (start < 0 || start > 31) throw new ArgumentOutOfRangeException(nameof(start));
        if (length < 1 || start + length > 32) throw new ArgumentOutOfRangeException(nameof(length));

        var mask = ((1 << length) - 1) << start;
        return new IntBit((bits & mask) | (value & ~mask));
    }

    public static implicit operator int(IntBit intBit) => intBit.value;
}

public static class IntBitManipulationExtensions
{
    public static int ChangeBit(this int value, int pos, bool bit) => new IntBit(value).ChangeBit(pos, bit);
    public static int ReplaceBitRange(this int value, int bits, int start, int length) => new IntBit(value).ReplaceBitRange(bits, start, length);
}
Symbai commented 5 years ago

@Clockwork-Muse @AlexejLiebenthal Thank you, I didn't know about that. For me I always needed BitArray for single integers, not arrays and it's the only way unless you know how to shift like @AlexejLiebenthal . Would you say it makes sense to have a new API for getting bits only from a single integer? Maybe even in BitConverter instead of BitArray? Or leaving this to external nuget packages?

I'm asking because I think I'm not the only one who would need it for single integers only. On the other hand when it comes to performance and this might be the only reason why not declaring an array matters, shifting is more likely the way to go.

Wraith2 commented 5 years ago

Would you say it makes sense to have a new API for getting bits only from a single integer?

Can't you use BitVector32?

Symbai commented 5 years ago

@Wraith2 Thanks for the tip but unfortunately it generates different bits on signed values.

const int testValue = -1873524097;
var one = new BitVector32(testValue);
var two = new BitArray(new [] {testValue });
Debug.Assert(one[7] == two[7]); // <---- Throws
Wraith2 commented 5 years ago

That is unexpected. The bit values for the signed version aren't the same either. I wasn't suggesting interoperability though i thought you wanted a single int that's addressible via bits and in that case using BitVector32 throughout should do what you want.

AlexejLiebenthal commented 5 years ago

BitVector32 has a different implemenation of this[int] {get;}.

BitArray checks whether the indexed bit is set or not. BitVector32 checks whether all bits are set to get the indexed value.

e.g.: 0b0110 -> 6 BitArray[0] -> false BitArray[1] -> true BitArray[2] -> true BitArray[3] -> false BitArray[4] -> ArgumentOutOfRangeException


BitVector32[0] -> true BitVector32[1] -> false BitVector32[2] -> true BitVector32[3] -> false BitVector32[4] -> true BitVector32[5] -> false BitVector32[6] -> true BitVector32[7] -> false BitVector32[8] -> false

Wraith2 commented 5 years ago

Oh, so you'd have to use bit consts to get the value you wanted, at which point you might as well be just masking yourself.

AlexejLiebenthal commented 5 years ago

Yes, it's more like checking via index, if a mask matches the (inner data) of BitVector32- instance.

I find the naming bit a bit misleading. Instead should this be renamed in my eyes to mask

// Summary:
//     Gets or sets the state of the bit flag indicated by the specified mask.
//
// Parameters:
//   bit:
//     A mask that indicates the bit to get or set.
//
// Returns:
//     true if the specified bit flag is on (1); otherwise, false.
public bool this[int bit] { get; set; }
stephentoub commented 5 years ago

I don't think we should add such an API; it's too specific, and if the goal is to eliminate an array allocation, it still won't help with the fact that the BitArray itself needs to be allocated and an underlying array it manages needs to be allocated.

If we were going to add any APIs here, it should be a ctor that takes a ReadOnlySpan<byte> and a CopyTo method that takes a Span<byte>. If you wanted to use that to then construct a BitArray from an int without allocating an array, you could create a span around the int and reinterpret cast it to a span of bytes.

ghost commented 3 years ago

Due to lack of recent activity, this issue has been marked as a candidate for backlog cleanup. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will undo this process.

This process is part of the experimental issue cleanup initiative we are currently trialing in a limited number of areas. Please share any feedback you might have in the linked issue.

Symbai commented 3 years ago

@stephentoub Since I'm still interested into this one, should I update the API proposal for the Readonlyspan<byte> ctor overload, or have you changed your opinion?

stephentoub commented 2 years ago

@stephentoub Since I'm still interested into this one, should I update the API proposal for the Readonlyspan ctor overload, or have you changed your opinion?

Sorry for my delay in responding; I missed the GH notification for it somehow.

I haven't changed my opinion :)

Symbai commented 2 years ago

Thanks, I've updated the first post.

kamronbatman commented 2 years ago

I don't think we should add such an API; it's too specific, and if the goal is to eliminate an array allocation, it still won't help with the fact that the BitArray itself needs to be allocated and an underlying array it manages needs to be allocated.

If we were going to add any APIs here, it should be a ctor that takes a ReadOnlySpan<byte> and a CopyTo method that takes a Span<byte>. If you wanted to use that to then construct a BitArray from an int without allocating an array, you could create a span around the int and reinterpret cast it to a span of bytes.

@stephentoub I made an API proposal and PR for exactly what you agree with and it was closed as a dupe. Should we pare down this proposal, or make a new one to at least get the ball rolling on something. I have a project that serialize/deserializes bit arrays to/from spans with a specific need for it to be performant.

@Symbai, if I am understanding him correctly, he is not in favor of the public static From(Span<byte>); signature.

Symbai commented 2 years ago

@kamronbatman Your original proposal had only CopyTo, only after I've told you that it duplicates with mine you added the ctor overload from mine to yours. But that still makes it a duplicates. As for the From overload I believe it makes sense IF CopyTo and a ctor overload for Span<byte> will be added. And stephentoub didn't said no to this one specifically. More that he doesn't want an int overload. This is what I read from his reponse, perhaps I'm wrong.

stephentoub commented 2 years ago

As for the From overload I believe it makes sense IF CopyTo and a ctor overload for Span will be added

How is the ctor functionally different from the ctor? Why would you still need From?

Symbai commented 2 years ago

Why would you still need From

It's part of the over 1 year ago request. I believe at the time of creation I thought there would have been a From already for all other types. After your previous comment I simply changed int to Span<bytes> leaving the rest as it is (because there was no explicit comment on From). Now that you are asking me I noticed it's a completely new function. That doesn't make sense, I'll remove it from the proposal.

kamronbatman commented 11 months ago

I still believe a zero-allocation and no additional copy to serialize the BitArray to a buffer (Span) is worth adding. Right now I am contesting with renting a big enough array from a pool, serializing to that, then copying that to the buffer. It's a huge waste. The other option is to wholesale duplicate BitArray and build a custom implementation, then hope developers of the project use the correct one, which has to be maintained/sync'd with dotnet on major releases just for that one extra CopyTo.

Specifically, the current CopyTo for byte[] already gets a span (after some checks of course). The code change is minimal. It's actually weird to me that there were optimizations done to CopyTo for vectorization, but no Span<byte> dest overload option added.

I would be in favor of adding CopyTo(Span<byte>) without the ctor or From.

This is what an example for my binary serialization would look like (in my case, I am doing 2-million of these in a high performance, multi-threaded scenario):

    public void Write(BitArray bitArray)
    {
        var bytesLength = (bitArray.Length - 1 + (1 << 3)) >>> 3;
        var arrayBuffer = ArrayPool<byte>.Shared.Rent(bytesLength);

        WriteEncodedInt(bytesLength);
        bitArray.CopyTo(arrayBuffer, 0);

        Write(arrayBuffer.AsSpan(0, bytesLength));
        ArrayPool<byte>.Shared.Return(arrayBuffer);
    }

vs with the proposal:

    public void Write(BitArray bitArray)
    {
        var bytesLength = (bitArray.Length - 1 + (1 << 3)) >>> 3;

        WriteEncodedInt(bytesLength);
        bitArray.CopyTo(_buffer.Slice(_pos, bytesLength));
        _pos += bytesLength;
    }

I do agree that we could use the ctor as well, but I am willing to make that a separate discussion so we can move forward with something.

Here is what the current deserialization code looks like, note the throw-away byte array:

    public BitArray ReadBitArray()
    {
        var byteArrayLength = ReadEncodedInt();

        // We need an exact array size since the ctor doesn't allow for offset/length, not much we can do at this point.
        var byteArray = new byte[byteArrayLength];

        Read(byteArray);

        return new BitArray(byteArray);
    }

Now that .NET 8 is out, @stephentoub can we get some discussion for .NET 9/10?