ash-project / ash_postgres

The PostgreSQL data layer for Ash Framework
https://hexdocs.pm/ash_postgres
MIT License
133 stars 71 forks source link

Proxy support for the multitenancy context strategy (pgbouncer and Supavisor) #141

Open dev-guy opened 1 year ago

dev-guy commented 1 year ago

Is your feature request related to a problem? Please describe. The Supavisor proxy for Postgres offers several benefits, the main one being the elimination of long-lived idle database connections. It can also route requests to different Postgres instances, satisfying both scalability and regulatory needs. Unfortunately, Ash's multitenancy apparently doesn't work with this feature because, while Ash's multitenancy context strategy can change the database schema name at runtime, it is apparently unable to similarly alter the Postgres username.

Describe the solution you'd like Allow Postgres multitenancy to dynamically modify the Postgres username with a tenant slug, similar to the way that Ash Postgres' context multitenancy strategy uses a different schema name based on the tenant at runtime. Supabase requires the user name format "tenant.usename". See the Supavisor docs for more information.

Describe alternatives you've considered

I've heard that "dynamic Ecto repositories" exists but have not researched this.

Express the feature either with a change to resource syntax, or with a change to the resource interface

To avoid hard-coding a solution, perhaps a function could be provided to the 'multitenancy' macro. I would want this to be more DRY than the following but this is the general idea. Note in the following example, the schema name wouldn't need to change, but IT DEPENDS, and thus this is an opportunity to customize behavior that is currently hard-coded. 'username' and 'schema' are optional.

  multitenancy do
    strategy :context # This feature might also be applicable to attribute multitenancy.. requires more thought
    username: fn (org_id, username)->"#{org_id}.#{username}" end
    schema: fn(org_id, schema)->schema end
  end

Being able to provide a function to map a tenant to all database credentials (host, user, password, schema) would be powerful in its own right and would eliminate the need for supabase and pgbouncer for many use cases.

Additional context I have not researched what would be needed to support the Postgres proxy pgbouncer. Ideally, one feature would address the needs of both pgbouncer and Supavisor.

There would be a table of tenants in a master Postgres database instance in the table organizations. This would be used to generate and track unique "org_id" values. This represents a single point of failure which seems unavoidable.

There would not be a foreign key from user to organization because these tables could be located on different physical Postgres instances. Or perhaps in the tenant databases there would also be an organization table but containing only one row.

zachdaniel commented 1 year ago

Definitely worth exploring :)

dev-guy commented 1 year ago

@zachdaniel I can work on a PR but need some more advice from you

zachdaniel commented 1 year ago

So the guide on setting up a dynamic repository is here: https://hexdocs.pm/ecto/replicas-and-dynamic-repositories.html

The main issue with us doing it this way in ash_postgres (i.e every query maybe getting a dynamic repo) in place of changing the schema is that we'd be constantly establishing new connections for each query for actions that may call many queries.

What we would need to do is something like this:

Add a hook for data layers to wrap the entire execution of an action for a given tenant, returning context to be set on all queries/changesets for that tenant, storing that context (keyed by the tenant) in the process dictionary context(Ash.set_context() which deep-merges). Whenever we execute a query where the tenant is different than the one before, we run this hook and carry its contents down to all other queries. This has to be implemented at the action invocation level, not using something like Ash.Query.around_action which just wraps the individual query call.

Whenever we run an action, we check for context for the tenant of that action, and set it on to the current query. We already have the ability to configure the repo dynamically like so:

`Ash.Query.set_context(query, %{data_layer: %{:repo, RepoName}})`. 

So this new hook for ash_postgres would just be setting a new repo. And then we'd configure it like this:

postgres do
  multitenant_context fn config, tenant -> 
    Keyword.put(config, :username, "#{tenant}.#{config[:username]}")
  end
end

The default implementation of that function would be Keyword.put(config, :prefix, tenant).

And if all that is changed in that keyword is the prefix option then we won't set a dynamic repo. If anything else changes, then we will.

There are a fair amount of moving parts here, but it can be done. We will also likely find a few places where we aren't carrying down parent context from one query to the next. We'll just have to iterate on it.