ClickHouse / ClickHouse

ClickHouse® is a real-time analytics DBMS
https://clickhouse.com
Apache License 2.0
37.74k stars 6.93k forks source link

Ability to Disable and Enable Partitions for Read/Write operations #66408

Closed tarun-annapareddy closed 4 months ago

tarun-annapareddy commented 4 months ago

Use case

In ClickHouse users re-balance the existing shards using FETCH, ATTACH in destination, and DETACH in source commands. But the issue is the writes, reads, and merges that happen during the phase of the ATTACH, DETACH process. This will lead to data inconsistencies which is not acceptable. Users also use this process in many use cases of data movement. In these scenarios, users are happy to see an error while accessing that part rather than receiving duplicated data.

Proposed Solution

If we allow a simple command to Disable Partition, it would be helpful to maintain data consistency. If any query hits the partition for reads or writes we can throw an error. This will allow the users to migrate in the background by ensuring data consistency. We can also ignore these partitions in our MergeSelection Algorithm. We will not need any heavy code changes. Each shard just needs to maintain the partitions that are disabled. We can create a system table so that users get visibility of disabled partitions. Both the ENABLE/DISABLE partition commands are replicated across the replicas of the shard.

a sample command can look like

ALTER TABLE table_name [ON CLUSTER cluster] DISABLE|ENABLE PARTITION|PART partition_expr

Describe alternatives you've considered

There is no alternative to do this in clickhouse as of today. This solution can be used in many use cases, like re-balancing, data movement to S3/Azure/HDFS, and others

I can pick up the implementation. I would like to check community interest in supporting these commands. We can first support this in the ReplicatedMergeTree and extend it

UnamedRus commented 4 months ago

writes we can throw an error.

BTW, for now i can imagine somewhat strange hack with materialized column with value throwIf(_partition_id = 'xxxxx')

ALTER TABLE table_name [ON CLUSTER cluster] DISABLE|ENABLE PARTITION|PART partition_expr

But not sure about ENABLE/DISABLE wording, unfortunately we use FREEZE PARTITION keyword for other feature already.

tarun-annapareddy commented 4 months ago

@UnamedRus yes I also do not have better wordage. Let me come up with something. But I want to ask about the feature itself. Can I create a PR for this? How do I Know If the community is interested in having this feature merged?

UnamedRus commented 4 months ago

How do I Know If the community is interested in having this feature merged?

I think, ability to pause merges on particular partition will be beneficial, However, afaik core team had mixed feeling about similar feature, which works on disk level. (but those concerns doesn't apply to your idea) https://github.com/ClickHouse/ClickHouse/pull/13956#pullrequestreview-486686665

Need to check other DBMS, but oracle have similar feature

ALTER TABLE toto1 MODIFY PARTITION ID 'xxxx' READ WRITE;
ALTER TABLE toto1 MODIFY PARTITION ID 'xxxx' READ ONLY;

Oracle 12c Release 2 introduces greater granularity for read-only objects and supports read-only table partitions. Any attempt to perform a DML operation on a partition, or sub-partition, set to READ ONLY results in an error.

https://blogs.oracle.com/datawarehousing/post/the-latest-in-oracle-partitioning-part-1-read-only-partitions

alexey-milovidov commented 4 months ago

Won't fix because of a narrow use case (implementation complexity isn't worth it).