Closed tarun-annapareddy closed 4 months ago
writes we can throw an error.
BTW, for now i can imagine somewhat strange hack with materialized column with value throwIf(_partition_id = 'xxxxx')
ALTER TABLE table_name [ON CLUSTER cluster] DISABLE|ENABLE PARTITION|PART partition_expr
But not sure about ENABLE/DISABLE wording, unfortunately we use FREEZE PARTITION keyword for other feature already.
@UnamedRus yes I also do not have better wordage. Let me come up with something. But I want to ask about the feature itself. Can I create a PR for this? How do I Know If the community is interested in having this feature merged?
How do I Know If the community is interested in having this feature merged?
I think, ability to pause merges on particular partition will be beneficial, However, afaik core team had mixed feeling about similar feature, which works on disk level. (but those concerns doesn't apply to your idea) https://github.com/ClickHouse/ClickHouse/pull/13956#pullrequestreview-486686665
Need to check other DBMS, but oracle have similar feature
ALTER TABLE toto1 MODIFY PARTITION ID 'xxxx' READ WRITE;
ALTER TABLE toto1 MODIFY PARTITION ID 'xxxx' READ ONLY;
Oracle 12c Release 2 introduces greater granularity for read-only objects and supports read-only table partitions. Any attempt to perform a DML operation on a partition, or sub-partition, set to READ ONLY results in an error.
Won't fix because of a narrow use case (implementation complexity isn't worth it).
Use case
In ClickHouse users re-balance the existing shards using FETCH, ATTACH in destination, and DETACH in source commands. But the issue is the writes, reads, and merges that happen during the phase of the ATTACH, DETACH process. This will lead to data inconsistencies which is not acceptable. Users also use this process in many use cases of data movement. In these scenarios, users are happy to see an error while accessing that part rather than receiving duplicated data.
Proposed Solution
If we allow a simple command to Disable Partition, it would be helpful to maintain data consistency. If any query hits the partition for reads or writes we can throw an error. This will allow the users to migrate in the background by ensuring data consistency. We can also ignore these partitions in our MergeSelection Algorithm. We will not need any heavy code changes. Each shard just needs to maintain the partitions that are disabled. We can create a system table so that users get visibility of disabled partitions. Both the ENABLE/DISABLE partition commands are replicated across the replicas of the shard.
a sample command can look like
ALTER TABLE table_name [ON CLUSTER cluster] DISABLE|ENABLE PARTITION|PART partition_expr
Describe alternatives you've considered
There is no alternative to do this in clickhouse as of today. This solution can be used in many use cases, like re-balancing, data movement to S3/Azure/HDFS, and others
I can pick up the implementation. I would like to check community interest in supporting these commands. We can first support this in the ReplicatedMergeTree and extend it