`current_setting` can lead to bad performance when used on RLS

steve-chavez commented 1 year ago

Edit: solution below.

Having:

-- $ postgrest-with-postgresql-15 psql -- on nix-shell

create table chat_messages(
  id text PRIMARY KEY
, content text
, workspace uuid
);

create or replace function app_metadata()
returns jsonb as $$
  select ((current_setting('request.jwt.claims', true)::jsonb)->'app_metadata');
$$ language sql stable;

alter table chat_messages enable row level security;

drop policy if exists sel on chat_messages;
create policy sel on chat_messages for select
using(
  workspace = (app_metadata() ->> 'workspace')::uuid
);

grant select on chat_messages to postgrest_test_anonymous;

insert into chat_messages(id, content, workspace)
select 
  'id-' || x, 
  'content-' || x, 
  case when x % 2 = 0 
    then 'b080c0f7-8c22-429c-9d1a-ade3e43fb109'::uuid 
    else '2156a8d9-d06f-40cd-adf8-411ae8c0bc76'::uuid 
  end
from generate_series(1, 3000000) x;

Doing a count is fast without RLS:

explain analyze SELECT COUNT(*) FROM chat_messages;
Planning Time: 0.105 ms
Execution Time: 161.549 ms

When RLS is enabled, the count is awfully slow:

begin;
set local role to postgrest_test_anonymous;
select set_config('request.jwt.claims', '{"aal":"aal1","amr":[{"method":"password","timestamp":1670987059}],"app_metadata":{"provider":"email","team":"T01TQ1XH8RW","workspace":"2156a8d9-d06f-40cd-adf8-411ae8c0bc76","workspace_int":195,"workspace_small_text":"195"},"aud":"authenticated","email":"user@email.com","exp":1671591861,"phone":"","role":"authenticated","session_id":"9033fca4-07e0-42ff-9e2f-0b0aa738e3d5","sub":"ef79d643-550e-4e65-a963-ad6f50e67dfe","user_metadata":{}}', true);

SELECT COUNT(*) FROM chat_messages;
  count
---------
 1500000

explain analyze SELECT COUNT(*) FROM chat_messages;
                                                                     QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=90821.44..90821.45 rows=1 width=8) (actual time=18597.007..18597.008 rows=1 loops=1)
   ->  Seq Scan on chat_messages  (cost=0.00..90795.25 rows=10476 width=0) (actual time=18597.002..18597.002 rows=0 loops=1)
         Filter: (workspace = ((((current_setting('request.jwt.claims'::text, true))::jsonb -> 'app_metadata'::text) ->> 'workspace'::text))::uuid)
         Rows Removed by Filter: 3000000
 Planning Time: 0.783 ms
 Execution Time: 18597.062 ms

This happens because:

jsonb casting(::jsonb) is not LEAKPROOF , this means that the casting will be done for each row on the query, PostgreSQL doesn't cache this static value.
The json value inside the request.jwt.claims is somewhat lengthy. We have no control over this as the JWT can come from an external system. A shorter jsonclaims will lead to less time:
```
begin;
set local role to postgrest_test_anonymous;
select set_config('request.jwt.claims', '{"app_metadata":{"workspace":"2156a8d9-d06f-40cd-adf8-411ae8c0bc76"}}', true);
```

explain analyze SELECT COUNT(*) FROM chat_messages;

                                                                 QUERY PLAN

Aggregate (cost=117937.00..117937.01 rows=1 width=8) (actual time=3449.217..3449.218 rows=1 loops=1) -> Seq Scan on chat_messages (cost=0.00..117937.00 rows=1 width=0) (actual time=3449.213..3449.213 rows=0 loops=1) Filter: (workspace = ((((current_setting('request.jwt.claims'::text, true))::jsonb -> 'app_metadata'::text) ->> 'workspace'::text))::uuid) Rows Removed by Filter: 3000000 Planning Time: 0.259 ms Execution Time: 3449.257 ms



The `count` is a contrived example(for which there are workarounds) but more real queries will lead to the same problem.

References:

- https://postgrest.org/en/stable/api.html#accessing-request-headers-cookies-and-jwt-claims

dwagin commented 1 year ago

Hello Steve,Whan about index on uuid column?

wolfgangwalther commented 1 year ago

This is certainly not because of LEAKPROOF or RLS. You can see it easily, by manually executing the constructed query:

SELECT COUNT(*) FROM chat_messages WHERE workspace = (app_metadata() ->> 'workspace')::uuid;

This is equally slow - even without RLS.

LEAKPROOF is about pushing conditions through RLS, i.e. when you do a query SELECT ... FROM rls_enabled_table WHERE some_func_used_here(col). In this case some_func_used_here must be defined LEAKPROOF to make it run on the base query directly - and not on the result after applying RLS.

But in this case, your function is inside the RLS policy. If you're leaking data there, that's all your own responsibility.

Whan about index on uuid column?

Yes, this is the right solution.

steve-chavez commented 1 year ago

But in this case, your function is inside the RLS policy. If you're leaking data there, that's all your own responsibility.

We encourage using current_setting on RLS on the docs , that has been there for quite a while. All the HTTP context GUCs could be used inside RLS, and we can't control the wrapper functions users put over them.

Hello Steve,Whan about index on uuid column?

Thanks dwagin, yeah that works. But the thing is, we didn't have this issue with the legacy GUCs that were available before v9.0.

So in the above example, by doing this:

create or replace function workspace()
returns uuid as $$
  select (current_setting('request.jwt.claims.app_metadata.workspace', true))::uuid;
$$ language sql stable;

drop policy if exists sel on chat_messages;
create policy sel on chat_messages for select
using(
  workspace = workspace()
);

begin;
set local role postgrest_test_anonymous;
select set_config('request.jwt.claims.app_metadata.workspace', '2156a8d9-d06f-40cd-adf8-411ae8c0bc76', true);
explain analyze SELECT COUNT(*) FROM chat_messages;

                                                      QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=87937.00..87937.01 rows=1 width=8) (actual time=1252.412..1252.413 rows=1 loops=1)
   ->  Seq Scan on chat_messages  (cost=0.00..87937.00 rows=1 width=0) (actual time=1252.409..1252.409 rows=0 loops=1)
         Filter: (workspace = (current_setting('request.jwt.claims.app_metadata.workspace'::text, true))::uuid)
         Rows Removed by Filter: 3000000
 Planning Time: 0.292 ms
 Execution Time: 1252.451 ms

The query is now x14 faster, without adding an index. So the JSON GUC parsing is indeed a problem.

steve-chavez commented 1 year ago

My proposal would be to have an allowlist config that brings back the legacy GUCs for certain keys to avoid the JSON parsing problem. Could be something like

http-context-shortcuts = "request.jwt.claims.app_metadata.workspace, request.headers.user_agent, request.cookies.session_id"

This seems related to https://github.com/PostgREST/postgrest/issues/1941, maybe the config proposed there could be merged with this one.

wolfgangwalther commented 1 year ago

The query is now x14 faster, without adding an index. So the JSON GUC parsing is indeed a problem.

I still think it's only a problem when you parse it 3 million times like in your example on a seq scan. But that's not the GUCs fault.

My proposal would be to have an allowlist config that brings back the legacy GUCs for certain keys to avoid the JSON parsing problem.

You can easily do this in a pre-request function. Just set your custom GUC to the extracted value from the JSON.

I'd rather wait for schema variables to finally come to PG - because those will solve this nicely.

steve-chavez commented 1 year ago

You can easily do this in a pre-request function. Just set your custom GUC to the extracted value from the JSON.

Yeah, that would work for avoiding the JSON guc but it will also come with a perf decrease because we don't have pipeline mode. Last time I checked adding a pre-request was like 15% reduction in throughput.

I'd rather wait for schema variables to finally come to PG - because those will solve this nicely.

True. Have you reviewed that feature by any chance? I wonder if it will land on pg16 and if indeed we can use it as we intend to.

steve-chavez commented 1 year ago

I wonder if it will land on pg16

That's a downside I guess, it will be almost a year for pg16.

I still think it's only a problem when you parse it 3 million times like in your example on a seq scan. But that's not the GUCs fault.

As I mentioned above the example is contrived. But there are some real world queries with big tables where filtering on some millions of rows is legitimate. I do think the JSON GUC is a problem here since the perf problem doesn't happen on the legacy gucs.

dwagin commented 1 year ago

The query is now x14 faster, without adding an index. So the JSON GUC parsing is indeed a problem.

In my environment only x3 (PG 13).

steve-chavez commented 1 year ago

Hm, I've tried on pg13(with nix-shell and postgrest-with-postgresql-13 psql) and I get a x9 perf increase.

dwagin commented 1 year ago

And speed up when PARALLEL RESTRICTED

steve-chavez commented 1 year ago

Circling back on this. How about this config:

# this is the default 
http-context = "request.method, request.path, request.jwt.claims, request.headers, request.cookies"

# we also set `search_path` and `role` for every request, but those cannot be modified as they're core to our security model

GUCs can be removed and shortcuts can be added:

http-context = "request.jwt.claims.app_metadata.workspace, request.headers"

For every sub level on request.jwt.claims.sub-level/request.headers.sub-level we'll get the value of the sub-level json key.

This will solve this issue and https://github.com/PostgREST/postgrest/issues/1941.

I think it will be useful even if we can do it with pre-request later.

steve-chavez commented 1 year ago

Just found out a plain SQL solution to the above issue, we can force computing the current_setting once with a MATERIALIZED CTE. So we have to change the above policy to:

drop policy if exists sel on chat_messages;
create policy sel on chat_messages for select
using(
  (
    with cached as materialized(
      select (app_metadata() ->> 'workspace')::uuid as val
    )
    select workspace = val from cached
  )
);

Now the policy is still correct and the COUNT is fast:

begin;
set local role to postgrest_test_anonymous;
select set_config('request.jwt.claims', '{"aal":"aal1","amr":[{"method":"password","timestamp":1670987059}],"app_metadata":{"provider":"email","team":"T01TQ1XH8RW","workspace":"2156a8d9-d06f-40cd-adf8-411ae8c0bc76","workspace_int":195,"workspace_small_text":"195"},"aud":"authenticated","email":"user@email.com","exp":1671591861,"phone":"","role":"authenticated","session_id":"9033fca4-07e0-42ff-9e2f-0b0aa738e3d5","sub":"ef79d643-550e-4e65-a963-ad6f50e67dfe","user_metadata":{}}', true);

SELECT COUNT(*) FROM chat_messages;
  count
---------
 1500000

explain analyze SELECT COUNT(*) FROM chat_messages;
                                                           QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=241687.00..241687.01 rows=1 width=8) (actual time=775.210..775.211 rows=1 loops=1)
   ->  Seq Scan on chat_messages  (cost=0.00..237937.00 rows=1500000 width=0) (actual time=0.145..728.419 rows=1500000 loops=1)
         Filter: (SubPlan 3)
         Rows Removed by Filter: 1500000
         SubPlan 3
           ->  Result  (cost=0.05..0.06 rows=1 width=1) (actual time=0.000..0.000 rows=1 loops=3000000)
                 CTE cached
                   ->  Result  (cost=0.00..0.03 rows=1 width=16) (actual time=0.054..0.056 rows=1 loops=1)
                 InitPlan 2 (returns $1)
                   ->  CTE Scan on cached  (cost=0.00..0.02 rows=1 width=16) (actual time=0.058..0.060 rows=1 loops=1)
 Planning Time: 0.238 ms
 Execution Time: 775.264 ms

This should be added to the docs

tonyxiao commented 1 year ago

This is epic! Also works wonders for things like postgres realtime too!

steve-chavez commented 1 year ago

drop policy if exists sel on chat_messages; create policy sel on chat_messages for select using( ( with cached as materialized( select (app_metadata() ->> 'workspace')::uuid as val ) select workspace = val from cached ) );

Note the above policy with the materialized CTE is not always a win. If the column is indexed, the above policy results in a noticeable slowdown(almost x6 slower) decrease:

create index on chat_messages(uuid);

drop policy if exists sel on chat_messages;
create policy sel on chat_messages for select
using(
  (
    with cached as materialized(
      select (app_metadata() ->> 'workspace')::uuid as val
    )
    select workspace = val from cached
  )
);

begin;
set local role to postgrest_test_anonymous;
select set_config('request.jwt.claims', '{"aal":"aal1","amr":[{"method":"password","timestamp":1670987059}],"app_metadata":{"provider":"email","team":"T01TQ1XH8RW","workspace":"2156a8d9-d06f-40cd-adf8-411ae8c0bc76","workspace_int":195,"workspace_small_text":"195"},"aud":"authenticated","email":"user@email.com","exp":1671591861,"phone":"","role":"authenticated","session_id":"9033fca4-07e0-42ff-9e2f-0b0aa738e3d5","sub":"ef79d643-550e-4e65-a963-ad6f50e67dfe","user_metadata":{}}', true);
explain analyze SELECT COUNT(*) FROM chat_messages;
                                                                               QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=209086.43..209086.44 rows=1 width=8) (actual time=817.665..817.666 rows=1 loops=1)
   ->  Index Only Scan using chat_messages_workspace_idx on chat_messages  (cost=0.43..205336.43 rows=1500000 width=0) (actual time=0.098..776.944 rows=1500000 loops=1)
         Filter: (SubPlan 2)
         Rows Removed by Filter: 1500000
         Heap Fetches: 0
         SubPlan 2
           ->  CTE Scan on cached  (cost=0.03..0.05 rows=1 width=1) (actual time=0.000..0.000 rows=1 loops=3000000)
                 CTE cached
                   ->  Result  (cost=0.00..0.03 rows=1 width=16) (actual time=0.055..0.057 rows=1 loops=1)
 Planning Time: 0.240 ms
 Execution Time: 817.720 ms

While the original policy without the materialized CTE from above:

drop policy if exists sel on chat_messages;
create policy sel on chat_messages for select
using(
  workspace = (app_metadata() ->> 'workspace')::uuid
);

begin;
set local role to postgrest_test_anonymous;
select set_config('request.jwt.claims', '{"aal":"aal1","amr":[{"method":"password","timestamp":1670987059}],"app_metadata":{"provider":"email","team":"T01TQ1XH8RW","workspace":"2156a8d9-d06f-4
0cd-adf8-411ae8c0bc76","workspace_int":195,"workspace_small_text":"195"},"aud":"authenticated","email":"user@email.com","exp":1671591861,"phone":"","role":"authenticated","session_id":"9033fca
4-07e0-42ff-9e2f-0b0aa738e3d5","sub":"ef79d643-550e-4e65-a963-ad6f50e67dfe","user_metadata":{}}', true);
explain analyze SELECT COUNT(*) FROM chat_messages;
BEGIN
SET
                                                                              QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=35164.45..35164.46 rows=1 width=8) (actual time=124.649..124.650 rows=1 loops=1)
   ->  Index Only Scan using chat_messages_workspace_idx on chat_messages  (cost=0.45..31414.95 rows=1499800 width=0) (actual time=0.082..78.337 rows=1500000 loops=1)
         Index Cond: (workspace = ((((current_setting('request.jwt.claims'::text, true))::jsonb -> 'app_metadata'::text) ->> 'workspace'::text))::uuid)
         Heap Fetches: 0
 Planning Time: 0.353 ms
 Execution Time: 124.694 ms
(6 rows)

So maybe we can just boil down to "always use an index when comparing against current_setting() values, if not possible make sure to cache them in a materialized CTE".

wolfgangwalther commented 1 year ago

There is a better way to force pg to execute the app_metadata function only once for the query: Make it use an InitPlan.

Instead of this:

create policy sel on chat_messages for select
using(
  workspace = (app_metadata() ->> 'workspace')::uuid
);

You'll need to wrap the right side in a scalar subquery:

create policy sel on chat_messages for select
using(
  workspace = (select (app_metadata() ->> 'workspace')::uuid)
);

Since the subquery does not depend on the outer query, it will result in an InitPlan. This will run it exactly once, and then put the result into the main query which can use an index or not, it doesn't matter - it will be fast in both cases.

First policy explained:

Aggregate  (cost=121660.00..121660.01 rows=1 width=8) (actual time=15673.547..15673.548 rows=1 loops=1)
  ->  Seq Scan on chat_messages  (cost=0.00..117937.00 rows=1489200 width=0) (actual time=0.026..15614.352 rows=1500000 loops=1)
        Filter: (workspace = ((((current_setting('request.jwt.claims'::text, true))::jsonb -> 'app_metadata'::text) ->> 'workspace'::text))::uuid)
        Rows Removed by Filter: 1500000
Planning Time: 0.281 ms
Execution Time: 15673.592 ms

Second policy explained:

Aggregate  (cost=69187.03..69187.04 rows=1 width=8) (actual time=264.203..264.204 rows=1 loops=1)
  InitPlan 1 (returns $0)
    ->  Result  (cost=0.00..0.03 rows=1 width=16) (actual time=0.028..0.029 rows=1 loops=1)
  ->  Seq Scan on chat_messages  (cost=0.00..65437.00 rows=1500000 width=0) (actual time=0.052..214.629 rows=1500000 loops=1)
        Filter: (workspace = $0)
        Rows Removed by Filter: 1500000
Planning Time: 0.138 ms
Execution Time: 264.229 ms

Second policy + index:

Aggregate  (cost=35168.46..35168.47 rows=1 width=8) (actual time=151.675..151.677 rows=1 loops=1)
  InitPlan 1 (returns $0)
    ->  Result  (cost=0.00..0.03 rows=1 width=16) (actual time=0.027..0.027 rows=1 loops=1)
  ->  Index Only Scan using chat_messages_workspace on chat_messages  (cost=0.43..31418.43 rows=1500000 width=0) (actual time=0.049..98.173 rows=1500000 loops=1)
        Index Cond: (workspace = $0)
        Heap Fetches: 0
Planning Time: 0.144 ms
Execution Time: 151.703 ms

tonyxiao commented 1 year ago

interesting, so should be the same performance as materialized CTE but a bit less verbose to write?

wolfgangwalther commented 1 year ago

interesting, so should be the same performance as materialized CTE but a bit less verbose to write?

No, it has better performance than a materialized CTE.

GaryAustin1 commented 1 year ago

I did a quick test in https://github.com/orgs/supabase/discussions/9311#discussioncomment-5732511 and get the same result with either method on a quick test, I'll do more. It makes sense they would be the same as they set a constant value to compare rows against and get run once.

BUT I believe this method is the one scattered thru out my original table https://github.com/orgs/supabase/discussions/9311#discussioncomment-3815869 which did things like:

GaryAustin1 commented 1 year ago

I did just run a 2nd test of the original auth.uid() = user_id 25K rows, and get 43, 13, 3 for original CTE, init with non_indexed user_id .07, 13, .13 for orginal, CTE, init with indexed user_id

So the initplan way is faster in this case. If you have an indexed column it seems to take advantage of it, but still slower than just the indexed column test.

wolfgangwalther commented 1 year ago

The difference you see between initplan and without for an indexed query is negligible. In theory, it's a very tiny overhead of running an additional initplan node in the query. But doing a few more runs gives me the following results:

Bare query using index, but no initplan:

152ms
123ms
126ms
125ms
122ms

Scalar subquery using index and initplan:

152ms
124ms
129ms
124ms
126ms

There's a lot more noise here than any noticeable difference between the two.

GaryAustin1 commented 1 year ago

Added some more tests a realistic two part RLS policy here with code: https://github.com/orgs/supabase/discussions/9311#discussioncomment-5734798 is_admin() OR auth.uid() = user_id 100K selects, no difference between indexed and not. Offset is adding limit 10 offset 90000 but only for admin as user gets 1 row. wrapped is initplan

role	normal	wrapped	CTE
user	1900	10.3	53
admin	1900	18.5	18.5
offset 90k	1900	48	48

Edit... Adding where user_id = 'UUID' to the query takes the time down to... 0.2msec from 10 and 53 for a user query.

PostgREST / postgrest-docs

`current_setting` can lead to bad performance when used on RLS #609