hashtable ideas / findings

ThomasWaldmann commented 7 years ago

borg uses a big (custom made, in C) in-memory hashtable for the chunk index and repo index.

current implementation: open addressing == closed hashing (one big bucket table with all entries inside, no separately allocated chains) with linear probing when collisions happen. load factor max. 0.75.

alternatives:

robin hood hashing, supports high load factors while staying fast, keeps lookup distances short (see PR #1429)
coalesced hashing

ready-to-use implementations:

google sparsehash
sparsepp (even faster)

general problems:

evaluating performance for borg is difficult / slow
we need something that is memory efficient:
- high load factor while still being fast
- little management overhead helps (we only have 40 - 44 bytes data per entry)
- should not allocate each entry via (m|c)alloc
we need to use it from cython / python

Other notes:

rust recently optimizes hashmap via robin hood hash algorithm

enkore commented 7 years ago

One important distinction for the hashtable used in Borg is that the keys are already hashes with ~perfect distribution. Thus tree or trie schemes make no sense at all, and tables optimized to deal with poor distributions expend those optimizations needlessly.

Also no need for variable length keys or values.

mkaminsky commented 7 years ago

I'm not familiar with the internals of borg, but based on the requirements listed above, another option might be a cuckoo hash table. You can find a ready-to-use implementation in C++ here: libcuckoo. Disclaimer: I'm involved in the project, so I'm a bit partial. :)

It's fast--tens to hundreds of millions of operations per second on a modern multi-core machine with small keys and values (a lookup only requires 1-2 memory references). It's also memory efficient--the load factor is typically > 0.9, and per-entry overhead is around one byte.

That said, libcuckoo was designed as a concurrent cuckoo hash table, for applications that want to access the table from multiple threads. If borg is going to use libcuckoo from a single thread, it should still be lean and reasonably fast, but might not get all of libcuckoo's potential performance. Also, I don't know how the cython integration works, but I suspect if sparsehash is a viable option, libcuckoo would also work.

Thought I would throw it out there in case its useful. Thanks for the nice work on borg.

ThomasWaldmann commented 7 years ago

@mkaminsky thanks for the hint. I didn't succeed with writing the C wrapper (that implements our _hashindex.c api), mostly because it gave me lots of strange compiler errors when trying to adapt it and I don't have much clue / practice with C++.

The idea was to have such a C wrapper, so we can just swap it in instead / as an alternative option to what we have now. Would it be easy for you to help with that (see _hashindex.c)?

LF > 0.9 without much slowdown and low memory overhead sounds good. Currently, we do not need concurrency (single-threaded borg 1.0 / 1.1), but in 1.2 we will have multithreading, so it might be useful.

enkore commented 7 years ago

Another thing that we completely forgot to mention here is that the regular performance of the table isn't an issue at this time. The problem are the degenerate cases around tombstoning that can happen even while bounded by load factor, and massively degrade performance.

In the past I considered that maybe a plain-dumb way to solve it would be to track the probe length of lookups, and if they pass a certain threshold you'd just rebuild the table.

mkaminsky commented 7 years ago

I spoke to @manugoyal, who is the main author of the current libcuckoo code. We think (not having done it yet) that adding a C wrapper should be doable. I believe that he's going to take a closer look and maybe give it shot; it could be something that's also useful for other C-based projects.

Regarding the performance: in cuckoo hashing, lookup performance is constant, regardless of the table size, the sequence of inserts/deletes, etc. All of the work happens on insert (which gets slower as the load factor increases). Lookups, though, only ever need to look in a fixed number of buckets. In most implementations (including libcuckoo) it's two buckets. Often you'll even find the key in the first bucket, but worst case, you only have to look in two places. For read-heavy workloads, this is particularly nice.

manugoyal commented 7 years ago

Hi,

Michael and I have been iterating on a C port of libcuckoo, and we think it is looking fairly doable. The issue is that I don't think we can easily support the exact interface laid out in hashindex.pyx. There are a few functions that don't map well to the table design:

hashindex_init: It's hard to initialize the table with variable-sized key and value types, because the C++ interface expects you to use templated Key and Value types with compile-time-defined sizes.

hashindex_next_key: It's possible to implement, but not really efficient, because you essentially have to do a search to find the location of the given key in the hashtable, so this would require a search every single time. Using iterator objects is the more efficient solution.

So the question is, how important it is to match the exact interface of hashindex.pyx, since an exact match will not really be possible without a non-trivial re-working of the C++ implementation.

Alternatively then, we were thinking of porting the existing C++ interface (which isn't exactly what the borg interface is, but all of the same functionality will be available), to C, and allowing you to easily define multiple versions of the table with different key/value pair sizes. Matching the existing C++ interface would be ideal, so that we might also put this port into the official libcuckoo repo. If you'd instead need a port as close as possible to the hashindex interface, we can do that too, but some re-factoring will still be necessary.

Any thoughts on the matter?

enkore commented 7 years ago

hashindex_init: It's hard to initialize the table with variable-sized key and value types, because the C++ interface expects you to use templated Key and Value types with compile-time-defined sizes.

Not a problem, there are only two or three fixed variants that can be compiled in.

hashindex_next_key: It's possible to implement, but not really efficient, because you essentially have to do a search to find the location of the given key in the hashtable, so this would require a search every single time. Using iterator objects is the more efficient solution.

One level up we use an iterator object so needing some other state / an iterator here would not be a problem.

Since there is a wrapper (hashindex.pyx) that abstracts the C interface away the way the C interface looks doesn't matter that much I think.

manugoyal commented 7 years ago

We just added a C port of libcuckoo to the repo. In the interest of making it a public feature, we ported the exact C++ interface to C, so it looks somewhat different than the borg interface. As mentioned before, the port requires you to instantiate different table libraries for different key-value types you're interested in using, but of course each instantiation has the same interface, and multiple table libraries can be used simultaneously.

If I understand correctly, borg is using the hashtable in a single-threaded context, which means the locked_table interface of libcuckoo should provide all the functionality required by the borg interface (the locked_table interface exactly emulates the STL unordered_map interface). It will probably be necessary to make a wrapper around the interface, but roughly it could work as follows:

static HashIndex *hashindex_read(const char *path)
    => cuckoo_table_read
static int hashindex_write(HashIndex *index, const char *path)
    => cuckoo_table_locked_table_write
static HashIndex *hashindex_init(int capacity, int key_size, int value_size)
    => cuckoo_table_init (no key or value size since each cuckoo_table has pre-defined key and value types)
static const void *hashindex_get(HashIndex *index, const void *key)
    => cuckoo_table_locked_table_find (this uses a special iterator object to point to the kvpair)
static int hashindex_set(HashIndex *index, const void *key, const void *value)
    => cuckoo_table_locked_table_insert (the return value will tell you if you if a new element was inserted or not)
static int hashindex_delete(HashIndex *index, const void *key)
    => cuckoo_table_locked_table_erase
static void *hashindex_next_key(HashIndex *index, const void *key)
    => after running cuckoo_table_locked_table_find, the iterator can move forwards or backwards

The examples directory in the libcuckoo repo contains an example program called c_hash which demonstrates a lot of these features and explains how to create instances of the table.

Would love if somebody could give the port a spin, any feedback is much appreciated!

borgbackup / borg

hashtable ideas / findings #1985