Simplifies readFrom and WriteTo, and improves their performance.

lemire commented 11 months ago

On one benchmark, this PR doubles the performance. With this PR, we reach 1.7 GB/s (reading and writing to memory). This is for reading and writing, so that the bandwidth is 3.4 GB/s which is not great, but... Note that this for in-memory... if you write and read to a disk... it is going to be slower.

Before...

$ go test -bench  BenchmarkBitsetReadWrite -benchmem
goos: darwin
goarch: arm64
BenchmarkBitsetReadWrite-8         69720             16075 ns/op              16 B/op          2 allocs/op
PASS
ok      _/Users/dlemire/CVS/github/bitset       1.533s

After...

$ go test -bench  BenchmarkBitsetReadWrite -benchmem
goos: darwin
goarch: arm64
BenchmarkBitsetReadWrite-8        151460              7396 ns/op            2064 B/op          4 allocs/op
PASS
ok      _/Users/dlemire/CVS/github/bitset       1.321s

Fixes https://github.com/bits-and-blooms/bitset/issues/141

lemire commented 11 months ago

@thanhpk @klauspost @omerfirmak @king526 : Please review/comment.

I really don't understand why our code is so convoluted when what I wrote is just... obvious? Is there a catch? Why do we have all this complexity in the first place ?

thanhpk commented 11 months ago

@lemire you are trading space for time, it would alloc a big chunk of memory, crashing some programs if the bitset is big enough

We already discussed this https://github.com/bits-and-blooms/bitset/pull/103

omerfirmak commented 11 months ago

@lemire you are trading space for time

-benchmem to see the GC hit you are taking. We, at Nethermind, don't depend on bitset anymore (due to https://github.com/bits-and-blooms/bitset/issues/134). But I would rather keep the GC at minimum.

lemire commented 11 months ago

@thanhpk

it would alloc a big chunk of memory, crashing some programs if the bitset is big enough

I don't think it would. The overhead memory usage should still effectively constant:


func readUint64Array(reader io.Reader, data []uint64) error {
    length := len(data)
    bufferSize := 128
    buffer := make([]byte, bufferSize*int(wordBytes))
    for i := 0; i < length; i += bufferSize {
        end := i + bufferSize
        if end > length {
            end = length
            buffer = buffer[:wordBytes*uint(end-i)]
        }
        chunk := data[i:end]
        if _, err := io.ReadFull(reader, buffer); err != nil {
            return err
        }
        for i := range chunk {
            chunk[i] = uint64(binaryOrder.Uint64(buffer[8*i:]))
        }
    }
    return nil
}

func writeUint64Array(writer io.Writer, data []uint64) error {
    bufferSize := 128
    buffer := make([]byte, bufferSize*int(wordBytes))
    for i := 0; i < len(data); i += bufferSize {
        end := i + bufferSize
        if end > len(data) {
            end = len(data)
            buffer = buffer[:wordBytes*uint(end-i)]
        }
        chunk := data[i:end]
        for i, x := range chunk {
            binaryOrder.PutUint64(buffer[8*i:], x)
        }
        _, err := writer.Write(buffer)
        if err != nil {
            return err
        }
    }
    return nil
}

@omerfirmak

I would rather keep the GC at minimum.

Yeah. It seems that binary.Read and binary.Write are allocating functions. But we can just reuse the buffer, see my code above.

thanhpk commented 11 months ago

@lemire I agree

bits-and-blooms / bitset

Simplifies readFrom and WriteTo, and improves their performance. #142