Unsafe to run shard copy concurrently with various functions and commands

marcocitus commented 7 years ago

We have a number of code paths that use placement metadata in an unsafe way, namely without first obtaining a shard metadata lock, meaning they are allowed to run concurrently with a shard repair/copy/move and might use stale metadata. Even if we do obtain a lock, we also need to make sure that changes in shard metadata made by repair/copy/move are visible once the lock is obtained, which may require a new snapshot. Not doing so may result in incorrect results, inconsistent replication, or data loss, when these code paths are exercised concurrently with a shard placement change.

ozgune commented 7 years ago

I'm adding the v6.1 milestone with the intent to spend a day on this and to document scenarios where we could have safety issues.

onderkalaci commented 7 years ago

@ozgune @marcocitus

document scenarios where we could have safety issues.

If I tried to document scenarios, it wouldn't be possible to find all UDFs that could lead to problems in half-a-day. Instead, I preferred to list the dangerous UDFs so that we could prioritize (either to document scenarios or fixing it). Do we want to fix some of the top items in the list for 6.1?

UDFs that we should consider to work with shard rebalancer safely (in the order of importance(?)):

start_metadata_sync_to_node(): Iterates through all shards/placements for MX. Potentially could send missing/wrong information to workers while initiliazed to be an MX node.
master_apply_delete_command(): Dangerous while applying the command rebalancer might move some placements. Those placements might not get the delete command.
mark_tables_colocated(): Checks shard/shard placements of distributed relations. Could lead to marking non-colocated tables as being co-located.
master_drop_distributed_table_metadata(): It uses worker_drop_distributed_table() so fixing the below should fix this as well.
- worker_drop_distributed_table(): Iterates over shard/shard placements to drop on the workers for MX.
master_drop_all_shards(): Iterates over shards to drop on DROP TABLE command. Might lead to not drop some of the placements (replicate) or some orphaned shards(rebalance).
master_get_table_metadata(): Due to shard replication factor in the output.
master_update_shard_statistics(): It uses FinializedShardPlacmentList()
master_expire_table_cache: Iterates over shards/shard placements.
master_stage_shard_row(): There is a comment saying that only used for csql. So, probably should be deleting the UDF itself.
master_stage_shard_placement_row(): There is a comment saying that only used for csql. So, probably should be deleting the UDF itself.

The `UDFs` that are protected by both shard metadata and/or shard resource locks (i.e, Already works fine OK with rebalancer):

master_modify_multiple_shards()
master_append_table_to_shard()
upgrade_to_reference_table()
master_add_node()

The `UDFs` that seem to not require any locks

master_create_empty_shard()

Some of the observations / notes:

UDFs that start with worker_ are mostly OK.
Newly implemented UDFs mostly acquire shard resource and shard metadata locks.

onderkalaci commented 7 years ago

Since the scope for 6.1 is to investigate and we already did, the 6.1 tag should be removed from the issue. Any objections?

metdos commented 7 years ago

Since the scope for 6.1 is to investigate and we already did, the 6.1 tag should be removed from the issue. Any objections?

I think we should create a 6.2 release milestone and move issues like this to there? @ozgune, @sumedhpathak?

sumedhpathak commented 7 years ago

@metdos that works for me. @ozgune do we intend to work on some of these in 6.2? Else we can just remove the 6.1 milestone to mark it as closed?

citusdata / citus