Add support for a BitArray/PackedBoolArray data type for storing a large amount of bools in a small amount of space

Kirbedev commented 2 months ago

Describe the project you are working on

A Metroidvania game that stores a lot of flags and map data with save files

Describe the problem or limitation you are having in your project

The type of information I need to store (e.g., uncovered map tiles, unlocked achievements, object on/off states, etc.) requires a very large amount of Boolean variables to store. The problem becomes very apparent here, since each value takes up a whole byte of memory, even though only one bit is actually being utilized. This leads to a lot of wasted memory, and while it may not seem to be a lot since most people have around 8-16GB of memory these days, it can add up very fast. For example, imagine you had 131072 (power of 2) map tiles to account for. Storing these in an array of bools will take up 131KB of memory. If you stored these as individual bits instead, it will only consume ~16KB of memory!

While it's true that you can use a BitMap to do just that, it requires 2D indexing and may not seem as intuitive or efficient as a BitArray, which is much more specialized for this purpose since it simply operates on a string of Bytes rather than a 2D matrix of them.

Describe the feature / enhancement and how it helps to overcome the problem or limitation

Add support for BitArray/PackedBoolArray to allow users to easily store single-bit flags quickly, without the need for additional logic like looping through a set of coordinates to access a bit from a BitMap. This solution will essentially allow you to store bools in an array just like how you can store Vector2 in a PackedVector2Array, among others. The engine will figure out the logic behind it using bytes and bit manipulation.

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

This proposal behave like an array, allowing you to simply store bools in it as you would for any other array, like this:

var bool_array: PackedBoolArray # Or BitArray

# With bools
bool_array.append([true, true, false, true, false, false, false, true])

# With ints
bool_array.append([1, 1, 0, 1, 0, 0, 0, 1])

print(bool_array) # Prints [1, 1, 0, 1, 0, 0, 0, 1], just as seen above. 
# All this occupies only 1 byte, instead of 8.

As you can see, this approach is very simple and easy, and aims to streamline optimization by making BitArray just as easy to use as a normal array.

In my game, I used a similar approach—albeit, a little more complex since you can't make custom array types in GDScript—by utilizing a PackedByteArray and performing bit manipulation on it. It works perfectly for my use case, and has caused no performance issues. Here was my implementation for it:

class BitArray:
    var data: PackedByteArray
    var active_bits: int

    func _init(_num_bits: int = 8):
        resize_bits(_num_bits)

    func _to_string():
        var string: String
        for i in range(active_bits): string += str(int(get_bit(i)))

        return string

    func resize_bits(new_size: int, reset_bits: bool = true):
        data.resize(ceili(float(new_size) / 8))
        active_bits = new_size

        if reset_bits: fill_bits(false)

    func fill_bits(new_value: bool):
        for i in range(active_bits): set_bit(i, new_value)

    func append(other: BitArray):
        var offset = active_bits
        resize_bits(active_bits + other.active_bits, false)

        for i in range(other.active_bits): set_bit(i + offset, other.get_bit(i))

    func get_bit(index: int):
        if index < 0 or index > len(data) * 8 - 1:
            push_error('ERROR: Tried to call `get_bit` with invalid index.')
            return false
        elif len(data) == 0: 
            push_error('ERROR: There are no bits to set.')
            return false

        return bool(data[index / 8] & 1 << (index % 8))

    func set_bit(index: int, value: bool):
        if index < 0 or index > len(data) * 8 - 1:
            push_error('ERROR: Tried to call `set_bit` with invalid index.')
            return
        elif len(data) == 0: 
            push_error('ERROR: There are no bits to set.')
            return

        var mask = 1 << (index % 8) # It's nice that GDScript supports this
        if value: data[index / 8] |= mask
        else: data[index / 8] &= ~mask

    func duplicate():
        var new_array = BitArray.new(active_bits)
        new_array.data = data.duplicate()
        return new_array

This obviously requires a little more work to use, but it is very fast and efficient, and gets the job done.

If this enhancement will not be used often, can it be worked around with a few lines of script?

If this isn't used often, for whatever reason, another approach is to just use some of the bit manipulation logic I used above directly on a PackedByteArray:

# Get bit
func get_bit(data: PackedByteArray, index: int) -> bool: 
    if (index < 0 or index > len(data) * 8 - 1) or len(data) == 0): return

    # Get the value directly
    return bool(data[index / 8] & 1 << (index % 8))

# Set bit
func set_bit(data: PackedByteArray, index: int, new_value: bool) -> void:
    if (index < 0 or index > len(data) * 8 - 1) or len(data) == 0): return

    # Mask out the value and set it
    var mask = 1 << (index % 8)
    if value: data[index / 8] |= mask
    else: data[index / 8] &= ~mask

This approach works fine as well, but it is much less straightforward as an Array since it requires you to manually control the size of data in order to use it properly, which is what by BitArray class automates. A proper implementation of this in the engine code will do the same thing.

Is there a reason why this should be core and not an add-on in the asset library?

Because being able to store a lot of bools/flags efficiently should be just as simple and straightforward to do as with any other PackedArray type, and probably won't be much of a hurdle to implement as more advanced features. bool is a fundamental data type, and it's not unlikely that you will end up having to store a lot of them (>8) at once at some point. Having this in core means that you can easily optimize parts of your game without needing to install an add-on for each project that needs it, and you don't have to do whole implementation of it yourself each time.

Calinou commented 2 months ago

Adding new Variant type[^1] has a performance cost, so it should be carefully considered. It needs to be used often enough within the engine to be worth doing.

[^1]: Every packed array type is its own Variant type, unlike the generic Array type.

While it's true that you can use a BitMap to do just that, it requires 2D indexing and may not seem as intuitive or efficient as a BitArray, which is much more specialized for this purpose since it simply operates on a string of Bytes rather than a 2D matrix of them.

Couldn't you make a BitMap with a very long width and 1 pixel height? There doesn't appear to be an individual limitation on width or height, other than width * height being lower than or equal to INT32_MAX. This means that for a BitMap that is 1 pixel tall, its maximum width would be INT32_MAX (roughly 2.14 billion).

Kirbedev commented 2 months ago

That's a good point. But what I wonder is, does accessing a BitMap have any additional overhead, even with a height of 1, compared to an Array? And especially one that is accessed constantly

Kirbedev commented 2 months ago

Actually, I have another idea. Since BitMaps work well enough, why not add a function to its class that allows you to access it with a single index? This way it can also behave as a flat array regardless of its dimensions. I know C++, but I never really looked at the way Variants are referenced internally, so here is an example of what I mean in GDscript:

func get_bit_flat(bitmap: BitMap, index: int) -> Vector2i:
    var map_size = bitmap.get_size()
    if index >= 0 and index < map_size.x * map_size.y:
        return Vector2i(index / map_size.x, index % map_size.x)
    else: return Vector2i.ZERO

EDIT: Ok, I looked at the source code for BitMap. I think this is how the function would be implemented, but I'm not entirely sure what ERR_FAIL_INDEX_V is--I'm guessing it tests the value and throws an error when it's out of range--so bear with me here.

bool BitMap::get_bit_flat(int index) const {
    ERR_FAIL_INDEX_V(index, width * height, false);
    return get_bit(index / width, index % width); // Not sure if BitMap:: prefix is required here
}

KoBeWi commented 2 months ago

You can also use PackedInt64Array, but it requires some wrapping effort to make it easy to use.

AThousandShips commented 2 months ago

Using a PackedInt64Array would be made easier in combination with:

https://github.com/godotengine/godot-proposals/issues/9480

AThousandShips commented 2 months ago

Actually, I have another idea. Since BitMaps work well enough, why not add a function to its class that allows you to access it with a single index?

This sounds like a great thing for an add-on, but I don't think it's general enough to warrant core, in most cases people aren't needing this level of data tightness for game development I'd say

Kirbedev commented 2 months ago

Alright, I am going to close this issue now. If anyone wants a quick implementation of BitArrays in GDScript, feel free to use the code I provided above.

godotengine / godot-proposals