ClickHouse / ClickHouse

ClickHouse® is a real-time analytics DBMS
https://clickhouse.com
Apache License 2.0
36.77k stars 6.8k forks source link

Aerospike as dictionary source/layout #2718

Closed filimonov closed 3 years ago

filimonov commented 6 years ago

The problem

Currently the scenarios of using dictionaries are quite limited by requirements: 1) should fit in memory 2) should be updated rare (otherwise will be not actual most of the time) 3) don't support point updates (i.e. only full reload of dictionary is acceptable). 4) hard/impossible to archive the consistent dictionary state across the cluster when data in dictionary changes.

cache dictionary can have more keys than memory but hit ratio should be VERY high to give acceptable performance, because updates in cache dictionary go in one thread and slow.

Possible solutions

There are only two hard things in Computer Science: cache invalidation and naming things. -- Phil Karlton

Of course current cache solution can be improved, but probably instead of reinventing the wheel and fighting the second 'hard thing' in Computer Science - it would be better to use some mature databases specially designed for caching, which give fast key-value access, can deal with updates etc.

There are a lot of databases which are used for caching:

The most well-known and widely popular are Memcached and Redis.

Some of them (like RocksDB) are can be embedded into Clickhouse. Sounds interesting, but in that case it will be hard to access/fill/update the dictionary from outside.

My favorite is Redis.

But I think the ideal companion for Clickhouse can be Aerospike.

Motivation: 1) opensource (AGPL for server, Apache v2 for client) & mature (since 2010) 2) written in C 3) real data storage, not 'caching layer' only (so you don't need to warm the cache after server restart) 4) nice official C client - should be easy to integrate; sources are on GitHub. Thread safe. 5) High performance (can give about 1M reads per sec on single server) 6) Easy to scale (clustering 'out of the box') (Redis / Memcached clustering requite a lot of extras) 7) No requirement for data to fit the memory (but indexes still should fit). Flash-optimized, and use SSD as permanent storage with low latency access (can't work properly with HDD) 8) uses strong types (as Clickhouse), so no overhead of useless transformations (Redis / Memcached usually textish values, so JSON is commonly used). 9) has official docker images - should not be a problem to create integration tests. 10) good documentation 11) a lot of nice features, including cli with SQL-like language, stored procedures (written in Lua), batch operations etc. 12) also aerospike is quite popular in web advertisement / analytics industry (i.e. the same sector as Clickhouse). Commercial licences are available if somebody need that.

Proposal

So the idea is to give direct access from Clickhouse to Aerospike via dictionaries API. I.e. source & layout will be 'direct access to aerospike', without any extra pre/post processing. So each read from dictionary in Clickhouse will be directly tranformed to read of Aerospike key, most probably the best idea would be to use batch mode API.

Later some API for updating dictionaries directly from Clickhouse can be introduced.

Of course Redis / Memcached support in same way can be also introduced, as they are more popular and more common.

But I think that Clickhouse+Aerospike can became extra 'killer feature' for Clickhouse.

alexey-milovidov commented 6 years ago

So, we have two tasks:

  1. Introduce direct dictionary layout (and maybe complex_key_direct for string keys).
  2. Add support for Memcached, Redis, Aerospike.
alexey-milovidov commented 5 years ago

These tasks are assigned to @comunodi @Gleb-Tretyakov @favstovol @FawnD2

4ertus2 commented 5 years ago

link #5421

alexey-milovidov commented 4 years ago

Aerospike: #5629 Cassandra: #4978

filimonov commented 4 years ago

Redis is delivered, but "Introduce direct dictionary layout" - not yet (w/o that ClickHouse still copy everything to own memory).

filimonov commented 4 years ago

https://github.com/ClickHouse/ClickHouse/pull/10622

alexey-milovidov commented 3 years ago

Implementation of Aerospike dictionary source has been cancelled due to low demand. Please add a reaction :+1: to this comment if you want us to continue on this task and :tada: if you don't care.

alexey-milovidov commented 3 years ago

Closing due to low demand.