citusdata / citus_docs

Documentation for Citus. Distributed PostgreSQL as an extension.
Creative Commons Attribution 4.0 International
58 stars 59 forks source link

Evaluate removing the "create distributed function" section from the quick start guide #1033

Open ozgune opened 2 years ago

ozgune commented 2 years ago

Why are we implementing it? (sales eng)

What are the typical use cases?

Communication goals (e.g. detailed howto vs orientation)

Our Quick Start guide is an opportunity to introduce simple concepts to our users.

https://docs.citusdata.com/en/v10.2/get_started/tutorial_multi_tenant.html

In the multi-tenant quick start guide, we introduce the following concept. I feel that the notion of additional roundtrips, creating a new UDF, and then declaring the use of the UDF as a distributed function goes beyond a quick start.

Could we evaluate removing the following section from our Quick Start Guide?

I'm asking because I haven't used create_distributed_function() in this way before. Although I'm not a power user, I also feel that this goes beyond what's needed to get started on Citus.

"Each statement in a transactions causes roundtrips between the coordinator and workers in multi-node Citus. For multi-tenant workloads, it’s more efficient to run transactions in distributed functions. The efficiency gains become more apparent for larger transactions, but we can use the small transaction above as an example.

First create a function that does the deletions:

CREATE OR REPLACE FUNCTION delete_campaign(company_id int, campaign_id int) RETURNS void LANGUAGE plpgsql AS $fn$ BEGIN DELETE FROM campaigns WHERE id = $2 AND campaigns.company_id = $1; DELETE FROM ads WHERE ads.campaign_id = $2 AND ads.company_id = $1; END; $fn$;

Next use create_distributed_function to instruct Citus to run the function directly on workers rather than on the coordinator (except on a single-node Citus installation, which runs everything on the coordinator). It will run the function on whatever worker holds the Shards for tables ads and campaigns corresponding to the value company_id.

SELECT create_distributed_function( 'delete_campaign(int, int)', 'company_id', colocate_with := 'campaigns' );

-- you can run the function as usual SELECT delete_campaign(5, 46);"

Good locations for content in docs structure

How does this work? (devs)

Example sql

Corner cases, gotchas

Are there relevant blog posts or outside documentation about the concept/feature?

Link to relevant commits and regression tests if applicable

onderkalaci commented 2 years ago

related to https://github.com/citusdata/citus_docs/issues/1024.

"Distributed functions" is an advanced topic, so it makes sense not to have it on the quick start.

Users typically create a distributed function and expect the function speed up (expecting similar behavior to create distributed table). However, in reality, the schema/functions should be properly set up to benefit from distributed functions. Hence, users are confused with the concept of distributed functions.

In fact, Marco thinks we could rename create_distributed_function to something more explicit like delegate_procedure_to_nodes or such.

jonels-msft commented 2 years ago

"Distributed functions" is an advanced topic, so it makes sense not to have it on the quick start.

IIRC there was a push to advertise that feature when it was released, but I agree that it's a distraction in an early tutorial.

In fact, Marco thinks we could rename create_distributed_function to something more explicit like delegate_procedure_to_nodes or such.

Sounds like a good idea. "Distributed function" suggests a false analogy with "distributed table."