cojen / TuplDB

TuplDB is a high-performance, concurrent, transactional, scalable, low-level embedded database.
GNU Affero General Public License v3.0
117 stars 23 forks source link

Support cached columns #120

Open broneill opened 1 year ago

broneill commented 1 year ago

Constructing strings from UTF-8 is expensive. Create an annotation which allows a column to be cached, either "soft" or "weak", where soft is the default. Document that caching is best suited for columns with low cardinality due to potential GC overhead.

The cache itself can be simple -- it has no max capacity and it doesn't perform any LRU reordering. A single global cache should work fine, and it needs to support high concurrency.

broneill commented 1 year ago

In addition to referring to strings, the cache entries also need to refer to the UTF-8 encoded bytes. This is necessary for making quick comparisons, but it also means that the cache occupies much more memory than might be expected. All the more reason to document that the caching feature should only be used for columns with low cardinality.