I want to add some feature option for fixing possible insecure shared cache behavior with shared_with_all_users option enabled.
User who haven't grants for tables in query can get cached result if another user who have sufficient grants already executed same query (like SELECT * FROM table) because only user authorization checked before cache read but not grants (without shared_with_all_users enabled request will be redirected to click and then failed if user have insufficient grants).
Example
For example, if we have ClickHouse database ny_taxi (from dataset examples) with user_A and user_B:
CREATE DATABASE ny_taxi;
CREATE TABLE ny_taxi.trips (
trip_id UInt32,
pickup_datetime DateTime,
dropoff_datetime DateTime,
pickup_longitude Nullable(Float64),
pickup_latitude Nullable(Float64),
dropoff_longitude Nullable(Float64),
dropoff_latitude Nullable(Float64),
passenger_count UInt8,
trip_distance Float32,
fare_amount Float32,
extra Float32,
tip_amount Float32,
tolls_amount Float32,
total_amount Float32,
payment_type Enum('CSH' = 1, 'CRE' = 2, 'NOC' = 3, 'DIS' = 4, 'UNK' = 5),
pickup_ntaname LowCardinality(String),
dropoff_ntaname LowCardinality(String)
)
ENGINE = MergeTree
PRIMARY KEY (pickup_datetime, dropoff_datetime);
INSERT INTO ny_taxi.trips
SELECT
trip_id,
pickup_datetime,
dropoff_datetime,
pickup_longitude,
pickup_latitude,
dropoff_longitude,
dropoff_latitude,
passenger_count,
trip_distance,
fare_amount,
extra,
tip_amount,
tolls_amount,
total_amount,
payment_type,
pickup_ntaname,
dropoff_ntaname
FROM gcs(
'https://storage.googleapis.com/clickhouse-public-datasets/nyc-taxi/trips_{0..2}.gz',
'TabSeparatedWithNames'
);
CREATE USER user_A IDENTIFIED WITH plaintext_password BY 'user_A';
CREATE USER user_B IDENTIFIED WITH plaintext_password BY 'user_B';
GRANT SHOW, SELECT ON ny_taxi.* TO user_A;
Another user_C trying to execute query through chproxy. He will have 403 Forbidden status code with response content:
Code: 516. DB::Exception: user_C: Authentication failed: password is incorrect, or there is no user with such name. (AUTHENTICATION_FAILED) (version 24.6.2.17 (official build))
And this is OK.
If user_B who is present in ClickHouse will try to execute query SELECT * FROM ny_taxi.trips LIMIT 10 for example. He will receive 500 Internal Server Error with response content:
Code: 497. DB::Exception: user_B: Not enough privileges. To execute this query, it's necessary to have the grant SELECT(trip_id, pickup_datetime, dropoff_datetime, pickup_longitude, pickup_latitude, dropoff_longitude, dropoff_latitude, passenger_count, trip_distance, fare_amount, extra, tip_amount, tolls_amount, total_amount, payment_type, pickup_ntaname, dropoff_ntaname) ON ny_taxi.trips. (ACCESS_DENIED) (version 24.6.2.17 (official build))
And this is OK too.
If user_A who is present in ClickHouse and have grants for ny_taxi.trips will execute query SELECT * FROM ny_taxi.trips LIMIT 10 he will receive 200 Ok and response data, which will be also cached in file_system_cache. And finally, until cached result expires user_B (who haven't sufficient grants for ny_taxi.trips) and any other user can execute this exact query and get cached response data even if they have no grants for SELECT on ny_taxi.trips. It works because of shared_with_all_users.
And this is not so good, but, as far as I understand, was designed this way.
Solution
My proposition is:
Add config option check_grants_for_shared_cache in caches section which enables or disables additional grants check for cached request.
Add additional grant check only for cached queries if shared_with_all_users: true and check_grants_for_shared_cache: true. This check will be performed only when user executes query and before load shared cache result. This check will create new request to ClickHouse (with same request scope) with wrapped original query into DESC ({{original_query}}). Along with query parsed and analyzed for return types (which is fast) ClickHouse checks permissions for the user to execute the query.
This solution I implemented in the PR.
Pull request type
Please check the type of change your PR introduces:
[ ] Bugfix
[x] Feature
[ ] Code style update (formatting, renaming)
[ ] Refactoring (no functional changes, no api changes)
[ ] Build related changes
[ ] Documentation content changes
[ ] Other (please describe):
Checklist
[x] Linter passes correctly
[ ] Add tests which fail without the change (if possible)
[x] All tests passing
[x] Extended the README / documentation, if necessary
Does this introduce a breaking change?
[ ] Yes
[x] No
Further comments
In my cases of usage chproxy, shared cache between users is very helpful with some BI instruments, because:
BI requests like pagination, table structure and other queries are cached good.
BI othen have many users and it's good to share cache between them.
Security issue like this, when user can access to data (cached) with insufficient grants prevent from using shared cache in this cases.
This is the first draft and I will refactor it later if this is useful feature.
I will be glad, if someone can give me a hint to implement 3 cases from Example in tests.
Description
Summary
I want to add some feature option for fixing possible insecure shared cache behavior with
shared_with_all_users
option enabled. User who haven't grants for tables in query can get cached result if another user who have sufficient grants already executed same query (likeSELECT * FROM table
) because only user authorization checked before cache read but not grants (withoutshared_with_all_users
enabled request will be redirected to click and then failed if user have insufficient grants).Example
For example, if we have ClickHouse database
ny_taxi
(from dataset examples) withuser_A
anduser_B
:and
chproxy
config:I want to review 3 cases:
user_C
trying to execute query throughchproxy
. He will have403 Forbidden
status code with response content:And this is OK.
user_B
who is present in ClickHouse will try to execute querySELECT * FROM ny_taxi.trips LIMIT 10
for example. He will receive500 Internal Server Error
with response content:And this is OK too.
user_A
who is present in ClickHouse and have grants forny_taxi.trips
will execute querySELECT * FROM ny_taxi.trips LIMIT 10
he will receive200 Ok
and response data, which will be also cached infile_system_cache
. And finally, until cached result expiresuser_B
(who haven't sufficient grants forny_taxi.trips
) and any other user can execute this exact query and get cached response data even if they have no grants forSELECT
onny_taxi.trips
. It works because ofshared_with_all_users
. And this is not so good, but, as far as I understand, was designed this way.Solution
My proposition is:
check_grants_for_shared_cache
incaches
section which enables or disables additional grants check for cached request.shared_with_all_users: true
andcheck_grants_for_shared_cache: true
. This check will be performed only when user executes query and before load shared cache result. This check will create new request to ClickHouse (with same request scope) with wrapped original query intoDESC ({{original_query}})
. Along with query parsed and analyzed for return types (which is fast) ClickHouse checks permissions for the user to execute the query.This solution I implemented in the PR.
Pull request type
Please check the type of change your PR introduces:
Checklist
Does this introduce a breaking change?
Further comments
In my cases of usage
chproxy
, shared cache between users is very helpful with some BI instruments, because:Security issue like this, when user can access to data (cached) with insufficient grants prevent from using shared cache in this cases.
This is the first draft and I will refactor it later if this is useful feature.
I will be glad, if someone can give me a hint to implement 3 cases from Example in tests.