Open broneill opened 1 year ago
In addition to referring to strings, the cache entries also need to refer to the UTF-8 encoded bytes. This is necessary for making quick comparisons, but it also means that the cache occupies much more memory than might be expected. All the more reason to document that the caching feature should only be used for columns with low cardinality.
Constructing strings from UTF-8 is expensive. Create an annotation which allows a column to be cached, either "soft" or "weak", where soft is the default. Document that caching is best suited for columns with low cardinality due to potential GC overhead.
The cache itself can be simple -- it has no max capacity and it doesn't perform any LRU reordering. A single global cache should work fine, and it needs to support high concurrency.