JacksonAllan / Verstable

A versatile, performance-oriented generic hash table library for C.
MIT License
77 stars 9 forks source link
c generics hashmaps hashtables maps sets
Verstable

Verstable is a versatile generic hash table intended to bring the speed and memory efficiency of state-of-the-art C++ hash tables such as Abseil/Swiss, Boost, and Bytell to C.

Its features include:

API: - Type safety. - Customizable hash, comparison, and destructor functions. - Single header. - C99 compatibility. - Generic API in C11 and later. Performance: - High speed mostly impervious to load factor. - Only two bytes of overhead per bucket. - Tombstone-free deletion.

Extensive benchmarks comparing Verstable to a range of other C and C++ hash tables, including Robin Hood tables and SIMD-accelerated tables, are available here.

Verstable is distributed under the MIT license.

Try it online here.

A variation of Verstable is also available as part of the broader generic data-structure library Convenient Containers.

Installation

Just download verstable.h and place it in your project's directory or your common header directory.

Example

Using the generic macro API (C11 and later): ```c #include // Instantiating a set template. #define NAME int_set #define KEY_TY int #include "verstable.h" // Instantiating a map template. #define NAME int_int_map #define KEY_TY int #define VAL_TY int #include "verstable.h" int main( void ) { // Set. int_set our_set; vt_init( &our_set ); // Inserting keys. for( int i = 0; i < 10; ++i ) { int_set_itr itr = vt_insert( &our_set, i ); if( vt_is_end( itr ) ) { // Out of memory, so abort. vt_cleanup( &our_set ); return 1; } } // Erasing keys. for( int i = 0; i < 10; i += 3 ) vt_erase( &our_set, i ); // Retrieving keys. for( int i = 0; i < 10; ++i ) { int_set_itr itr = vt_get( &our_set, i ); if( !vt_is_end( itr ) ) printf( "%d ", itr.data->key ); } // Printed: 1 2 4 5 7 8 // Iteration. for( int_set_itr itr = vt_first( &our_set ); !vt_is_end( itr ); itr = vt_next( itr ) ) printf( "%d ", itr.data->key ); // Printed: 2 4 7 1 5 8 vt_cleanup( &our_set ); // Map. int_int_map our_map; vt_init( &our_map ); // Inserting keys and values. for( int i = 0; i < 10; ++i ) { int_int_map_itr itr = vt_insert( &our_map, i, i + 1 ); if( vt_is_end( itr ) ) { // Out of memory, so abort. vt_cleanup( &our_map ); return 1; } } // Erasing keys and values. for( int i = 0; i < 10; i += 3 ) vt_erase( &our_map, i ); // Retrieving keys and values. for( int i = 0; i < 10; ++i ) { int_int_map_itr itr = vt_get( &our_map, i ); if( !vt_is_end( itr ) ) printf( "%d:%d ", itr.data->key, itr.data->val ); } // Printed: 1:2 2:3 4:5 5:6 7:8 8:9 // Iteration. for( int_int_map_itr itr = vt_first( &our_map ); !vt_is_end( itr ); itr = vt_next( itr ) ) printf( "%d:%d ", itr.data->key, itr.data->val ); // Printed: 2:3 4:5 7:8 1:2 5:6 8:9 vt_cleanup( &our_map ); } ``` Using the prefixed functions API (C99 and later): ```c #include // Instantiating a set template. #define NAME int_set #define KEY_TY int #define HASH_FN vt_hash_integer #define CMPR_FN vt_cmpr_integer #include "verstable.h" // Instantiating a map template. #define NAME int_int_map #define KEY_TY int #define VAL_TY int #define HASH_FN vt_hash_integer #define CMPR_FN vt_cmpr_integer #include "verstable.h" int main( void ) { // Set. int_set our_set; int_set_init( &our_set ); // Inserting keys. for( int i = 0; i < 10; ++i ) { int_set_itr itr = int_set_insert( &our_set, i ); if( int_set_is_end( itr ) ) { // Out of memory, so abort. int_set_cleanup( &our_set ); return 1; } } // Erasing keys. for( int i = 0; i < 10; i += 3 ) int_set_erase( &our_set, i ); // Retrieving keys. for( int i = 0; i < 10; ++i ) { int_set_itr itr = int_set_get( &our_set, i ); if( !int_set_is_end( itr ) ) printf( "%d ", itr.data->key ); } // Printed: 1 2 4 5 7 8 // Iteration. for( int_set_itr itr = int_set_first( &our_set ); !int_set_is_end( itr ); itr = int_set_next( itr ) ) printf( "%d ", itr.data->key ); // Printed: 2 4 7 1 5 8 int_set_cleanup( &our_set ); // Map. int_int_map our_map; int_int_map_init( &our_map ); // Inserting keys and values. for( int i = 0; i < 10; ++i ) { int_int_map_itr itr = int_int_map_insert( &our_map, i, i + 1 ); if( int_int_map_is_end( itr ) ) { // Out of memory, so abort. int_int_map_cleanup( &our_map ); return 1; } } // Erasing keys and values. for( int i = 0; i < 10; i += 3 ) int_int_map_erase( &our_map, i ); // Retrieving keys and values. for( int i = 0; i < 10; ++i ) { int_int_map_itr itr = int_int_map_get( &our_map, i ); if( !int_int_map_is_end( itr ) ) printf( "%d:%d ", itr.data->key, itr.data->val ); } // Printed: 1:2 2:3 4:5 5:6 7:8 8:9 // Iteration. for( int_int_map_itr itr = int_int_map_first( &our_map ); !int_int_map_is_end( itr ); itr = int_int_map_next( itr ) ) printf( "%d:%d ", itr.data->key, itr.data->val ); // Printed: 2:3 4:5 7:8 1:2 5:6 8:9 int_int_map_cleanup( &our_map ); } ```

API

Full API documentation is available here.

FAQ

How does it work?

Verstable is an open-addressing hash table using quadratic probing and the following additions:

One way to conceptualize this scheme is as a chained hash table in which overflowing keys are stored not in separate memory allocations but in otherwise unused buckets. In this regard, it shares similarities with Malte Skarupke’s Bytell hash table and traditional "coalesced hashing".

Advantages of this scheme include:

The generic macro API available in C11 is based on the extendible-_Generic mechanism detailed here.

How is it tested?

Verstable has been tested under GCC, Clang, MinGW, and MSVC. tests/unit_tests.c includes unit tests for sets and maps, with an emphasis on corner cases. tests/tests_against_stl.cpp includes randomized tests that perform the same operations on Verstable sets and maps, on one hand, and C++'s std::unordered_set and std::unordered_map, on the other, and then check that they remain in sync. Both test suites use a tracking and randomly failing memory allocator in order to detect memory leaks and test out-of-memory conditions.

Why the name?

The name is a contraction of "versatile table". Verstable handles various conditions that strain other hash table schemes—such as large keys or values that are expensive to move, high load factors, expensive hash or comparison functions, and frequent deletions, iteration, and unsuccessful lookups—without significant performance loss. In other words, it is designed to be a good default choice of hash table for most use cases.