etcd-io / bbolt

An embedded key/value database for Go.
https://go.etcd.io/bbolt
MIT License
7.87k stars 619 forks source link

Introduce freelist interface #773

Open tjungblu opened 4 days ago

tjungblu commented 4 days ago

bbolt features two types of freelist: "array" and "hashmap". They are currently implemented in one monolithic struct, with their differences in functions separated into two go files.

Testing has been a pain so far, with many of the freelist details (i.e.releasing pages and serde to disk) leaking outside of the freelist struct. This makes reasoning about the implementations very difficult, too. Adding a new type of list is becoming increasingly cumbersome without a proper interface.

Quickly pulling out the existing functions into an interface could look like this:

type Freelist interface {

    // Init initializes this freelist with the given list of pages. 
        // TODO currently is "readIDs", maybe should be called Read(page *common.Page)
    Init(ids common.Pgids) 

    // Allocate returns the starting page id of a contiguous block of a given size in number of pages.
    // If a contiguous block cannot be found then 0 is returned.
    Allocate(txid common.Txid, numPages int) common.Pgid

    // Count returns the number of free pages.
    Count() int

    // FreePageIds returns all free pages.
    FreePageIds() common.Pgids

    // MergeSpans merges the given pages into the existing freelist.
    MergeSpans(ids common.Pgids) // TODO should probably named "Release" or "Free"

    // Write writes the freelist into the given page. The freelist MUST fit into the given page.
    Write(page *common.Page) error
}

Ideally we would move this into its own internal package in order to avoid leaking any more internals in the future. HashMap and Array should be two implementation as its own struct. Some shared functions (ie read/write) could be implemented in a common struct for now. Even though I expect those to become implementation dependent at some point, it seems quite inefficient to write low-digit uint64 integers without vint compression (or as a bitset), for example.

ahrtr commented 4 days ago

High level makes sense. Eventually we should consider to deprecate the Array free list.

tjungblu commented 4 days ago

Admittedly, this interface was too idealistic for such an old codebase :) Turns out we're leaking implementation details everywhere and for some optimization also have introduced bad coupling. The tests are especially fun to decouple :1st_place_medal:

Also the coupling of the transaction to the pages should not go into the freelist (imho) and we should have another mapping in the Tx that contains what pages have been allocated for it.

Below seems like a functional interface, but I think we would need to cut this down a bit further to be practical.

type Freelist interface {

    // Init initializes this freelist with the given list of pages.
    Init(ids common.Pgids)

    // Allocate returns the starting page id of a contiguous block of a given size in number of pages.
    // If a contiguous block cannot be found then 0 is returned.
    Allocate(txid common.Txid, numPages int) common.Pgid

    // Count returns the number of free pages.
    Count() int

    // PendingCount returns the number of pending pages.
    // TODO(thomas): implementation detail?
    PendingCount() int

    // FreePageIds returns all free pages.
    FreePageIds() common.Pgids

    // Release moves all page ids for a transaction id (or older) to the freelist.
    Release(txid common.Txid)

    // ReleaseRange moves pending pages allocated within an extent [begin,end] to the free list.
    ReleaseRange(begin, end common.Txid)

    // Free releases a page and its overflow for a given transaction id.
    // If the page is already free then a panic will occur.
    Free(txid common.Txid, p *common.Page)

    // Rollback removes the pages from a given pending tx.
    Rollback(txid common.Txid)

    // Freed returns whether a given page is in the free list.
    Freed(pgId common.Pgid) bool

    // Reload reads the freelist from a page and filters out pending items.
    // TODO(thomas): could be just Read?
    Reload(p *common.Page)

    // NoSyncReload reads the freelist from Pgids and filters out pending items.
    // TODO(thomas): could be just Init?
    NoSyncReload(Pgids []common.Pgid)

    /*
     * SerDe Section
     */

    // Read calls Init with the page ids stored in te given page.
    Read(page *common.Page)

    // EstimatedWritePageSize returns the size of the page after serialization in Write.
    // TODO(thomas): implementation detail?
    EstimatedWritePageSize() int

    // Write writes the freelist into the given page.
    Write(page *common.Page)
}

source (not everything is implemented/updated yet): https://github.com/tjungblu/bbolt/commit/d3831cbb52476bf939a4cd1d95f11280ca03b08f

Do you want to continue through a PR? Happy to adjust and refactor the remaining pieces.

tjungblu commented 3 days ago

opened #775, let's focus on the interface first. I think we also need @ivanvc's benchmark for it :)

ahrtr commented 2 days ago

I suggest to move all array related methods into a separate file freelist_array.go as the very first step, so as to simplify the freelist.go. Please feel free to deliver a PR for that, thx.

ahrtr commented 2 days ago

After that we can continue to discuss how to refactor the freelist management in your PR https://github.com/etcd-io/bbolt/pull/775.

Also as your pointed out above, we need the benchmark (https://github.com/etcd-io/bbolt/pull/750) workflow check ready beforehand. @ivanvc can you take that as a higher priority task? thx

tjungblu commented 2 days ago

there you go @ahrtr https://github.com/etcd-io/bbolt/pull/777

ivanvc commented 2 days ago

@ivanvc can you take that as a higher priority task? thx

Yes, I was able to work on it yesterday. I'll try to have something by the EOD.