citusdata / citus_docs

Documentation for Citus. Distributed PostgreSQL as an extension.
Creative Commons Attribution 4.0 International
58 stars 58 forks source link

Doc changes for Metadata sync transactional/non-transactional mode #1084

Closed pinodeca closed 1 year ago

pinodeca commented 1 year ago

Why are we implementing it? (sales eng)

Customers with very large clusters have problems adding new nodes to their clusters and upgrading above Citus 11.0, which introduced query from any node, due to memory problems during metadata sync.

What are the typical use cases?

Communication goals (e.g. detailed howto vs orientation)

Good locations for content in docs structure

How does this work? (devs)

We added an alternative non-transactional mode to the current metadata sync which performs inside a single transaction. Single transaction mode causes issues since PG has a hard memory limit related to cache invalidations. But now, we can switch into non-transactional mode, which syncs the metadata via many transactions, if we have such a memory error.

Example sql

To add a new node:

SET citus.metadata_sync_mode TO 'nontransactional';
SELECT citus_add_node(<ip>, <port>);

To sync all the nodes:

SET citus.metadata_sync_mode TO 'nontransactional';
SELECT start_metadata_sync_to_all_nodes();

Corner cases, gotchas

Are there relevant blog posts or outside documentation about the concept/feature?

Not yet. But I plan to publish a blog post.

Link to relevant commits and regression tests if applicable

You can see the related PR https://github.com/citusdata/citus/pull/6728 Regression tests:

pinodeca commented 1 year ago

@aykut-bozkurt Can you please fill in each of the template sections in the description above?

jonels-msft commented 1 year ago

@aykut-bozkurt thanks for the details, I have a few follow-up questions.

  1. It sounds like non-transactional sync can fail but can be re-run safely. Is it the user's responsibility to check for failure? On failure does the user need to manually retry their calls to functions like citus_add_node() and start_metadata_sync_to_all_nodes()?
  2. Should we advise users to try transactional mode first and see if they get an error, then switch to non-transactional and try again? Or is the OOM problem pretty severe, so that we should advise people with clusters greater than a certain size to always go non-transactional? If so, what size?
  3. If transactional sync can die and non-transactional can't, why is our default transactional? I want to present the pros and cons of both options clearly in the docs.
aykut-bozkurt commented 1 year ago
  1. It is user who should manually rerun after any failure,
  2. Default mode is transactional which is safer in case of a failure since we perform inside single transaction, which means either commit or rollback all the changes to end up at a consistent state, (They only should switch into non-transactional mode when memory failure hits)
  3. As I said transactional mode performs atomically, but non-transactional sync can fail and end up at inconsistent state even if user can manually rerun it safely later.