Partitioning doesn't work with statement based replication

pykello commented 6 years ago

This came in a customer conversation. What is required to implement this?

cc: @onderkalaci

marcocitus commented 6 years ago

Shard resource locks need to ensure that updates on a child table and updates on a parent table are serialised.

If a child placement becomes inactive, the parent should also become inactive and vice versa.

ozgune commented 6 years ago

@onderkalaci / @marcocitus -- How long would it take to fix this issue?

I'm also noting #1459 as a related issue.

ozgune commented 6 years ago

I just chatted with @marcocitus on this. We expect this fix to take about 3-4 weeks of time.

ozgune commented 6 years ago

I wanted to share a quick update on this issue.

@marcocitus looked into this in more detail. He noted that if we disallowed direct writes to the partitioned tables (and only allow you to write to the parent table), this fix would take only 1-2 weeks.

We need to check if this requirement works for our customers.

jnicholls commented 6 years ago

We utilize the run_command_on_colocated_placements function for colocated rollups. There are reasons that insert into...select... does not work for our use case, including but not limited to on conflict being unsupported by pg10 for partitioned tables and order by use in aggregate functions (similar to a window function) not being supported by Citus.

If direct writes to the placements are still allowed, then this will be fine. We have no desire to write directly to partitions (neither on the coordinator nor the partitions of each placement on the workers).

samay-sharma commented 6 years ago

Ran into this with 2 customers who requested this. @ozgune : We can chat in more detail about the context in person.

hlakshmi commented 6 years ago

We ran into similar issue where we are not able to shard a partitioned table with replication factor > 1.

xxxxxx=# select create_distributed_table('table_name', 'column_name');
ERROR:  distributing partitioned tables with replication factor greater than 1 is not supported

But after changing the replication_factor to 1, I am able to shard the table. It would be great if its able to shard the partitioned table with replication_factor > 1 as we are looking for HA.

samay-sharma commented 6 years ago

@hlakshmi : Would allowing writes to the parent table (and not to specific partitions) suffice for your use case? The reason I ask its that's an easier fix and would be easy to get to you sooner.

jnicholls commented 6 years ago

What is a good ETA on this feature, assuming the easy path is taken (no direct writes to partitions at coordinator level, and assuming commands like run_command_on_placements still function)?

Edit: I'm noting that the original estimation was about 1-2 weeks, many weeks back. I assume this still has not been started, but it's very much desired :)

jnicholls commented 6 years ago

@samay-sharma Do you have an ETA for this feature? Has it been started? Would you need assistance in implementing it? I’d like to bump the urgency of this. Thanks!

onderkalaci commented 6 years ago

We're currently working on this. So, the ETA is probably whenever 8.0 is out, which shouldn't be too far away. But, obviously, I cannot give an exact date for this since it involves many other things as well.

So, some notes and questions on the requirements:

All kinds of modifications need to be done on the parent table. See the items that come to my mind now:
- INSERT/UPDATE/DELETE/UPSERT/INSERT...SELECT
- DDLs (such as create index as well)
- DROPping partition would involve ALTER TABLE parent DETACH PARTITION partition followed by DROP TABLE partition
- TRUNCATE
Shard repair functionaly would only work on the parent table. This implies that in case of invalid shard placements, it'd require dropping the invalid shard placement all together with its partitions and re-creating them on the target node.
We may not be able to quickly implement the replicate_table_shards(). @jnicholls how important is that for you? Do you use that rebalancer functionality?
Direct writes to placements and use of run_command_on_colocated_placements () is still allowed

samay-sharma commented 6 years ago

@onderkalaci : Could you elaborate a bit on the "shard repair functionality would only work on the parent table". Would all failed inserts mark the parent table as invalid and if so, do we just need to repair it?

Are there cases where the child tables may be marked as invalid and the parent table is not even though we allow writes only on the parent tables. If so, how do we address that case?

marcocitus commented 6 years ago

Are there cases where the child tables may be marked as invalid and the parent table is not even though we allow writes only on the parent tables.

No, the reason we're disallowing writes to partitions is largely to prevent that (and some locking/ordering issues).

What Önder means is that 8.0 is also not going to be smart enough to only repair a single partition after s failure. It will always repair all.

jnicholls commented 6 years ago

We may not be able to quickly implement the replicate_table_shards(). @jnicholls how important is that for you? Do you use that rebalancer functionality?

@onderkalaci Yes I do use this functionality. But repair is more important than rebalancing in the short term. If you must release that functionality in 8.1 or a subsequent patch build of shard_rebalancer for me specifically that is fine :)

jnicholls commented 6 years ago

My obligatory check-in. My company is going through a 6 month stability and reliability process and this is one of the primary issues we have in Citus today that must be addressed. Some folks are pushing to move away from Citus completely, which I am fighting b/c it is not a good idea, but it is blemishes like this that fuel their argument.

onderkalaci commented 6 years ago

Just FYI, we've implemented this feature. But given that we've some limitations (as discussed above), I preferred to keep the issue open.

Once we package Citus 8.0 (which should be very close), you should be able to use it.

jnicholls commented 6 years ago

great thank you!

On Sat, Oct 13, 2018 at 2:15 PM Önder Kalacı notifications@github.com wrote:

Just FYI, we've implemented this feature. But given that we've some limitations (as discussed above), I preferred to keep the issue open.

Once we package Citus 8.0 (which should be very close), you should be able to use it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/citusdata/citus/issues/2233#issuecomment-429563623, or mute the thread https://github.com/notifications/unsubscribe-auth/AABPqyFHDXlZzDJemMU-KXVo7Xnp-wXGks5uki3fgaJpZM4UyuQi .

-- Sent from Gmail Mobile

onderkalaci commented 3 years ago

Adding this discussion as a reference: https://github.com/citusdata/citus/pull/5381#discussion_r733638858

citusdata / citus

Partitioning doesn't work with statement based replication #2233