ClickHouse / ClickHouse

ClickHouse® is a real-time analytics DBMS
https://clickhouse.com
Apache License 2.0
37.01k stars 6.83k forks source link

What kinds of issues system.session_log table has? #51760

Open Demilivor opened 1 year ago

Demilivor commented 1 year ago

What kinds of issues does the system.session_log table has?

in programs/server/config.xml I see this comment: Note: session log has known security issues and should not be used in production.

If I enable session log in config.xml:

     <session_log>
        <database>system</database>
        <table>session_log</table>
        <partition_by>toYYYYMM(event_date)</partition_by>
        <flush_interval_milliseconds>7500</flush_interval_milliseconds>
    </session_log>

And then If I launch ClickHouse client, I see the warning message:

ClickHouse client version 23.6.1.1.
Connecting to localhost:9000 as user default.
Connected to ClickHouse server version 23.6.1 revision 54464.

Warnings:
 * Table system.session_log is enabled. It's unreliable and may contain garbage. Do not use it for any kind of security monitoring.

Why session_log may contain garbage? Is this issue actual? I can't reproduce it.

rvasin commented 1 year ago

@alexey-milovidov , @den-crane, @UnamedRus Which are remaining issues with session_log? We may try to fix the issues to make session_log production ready. Do you have any ideas how to re-produce the issues?

UnamedRus commented 1 year ago

One missing feature for session log, is ClickHouse should be able to write it to some file on disk, as potential intruder can access clear session_log table

UnamedRus commented 1 year ago

@tavplubix should have more insides about ClickHouse Inc folks complains about session_log

rvasin commented 1 year ago

One missing feature for session log, is ClickHouse should be able to write it to some file on disk, as potential intruder can access clear session_log table

Yes. It's a good idea. I see you already requested it in #44128. Right now we could start with fixing the remaining issues with session_log. Then in the next PRs we could start to add more features to it (like new columns) and add new things like writing it to file as you requested.

tavplubix commented 1 year ago

Remaining issues:

Demilivor commented 1 year ago

@tavplubix Thanks for the clarification,

Can you please tell me how I can reproduce the problem: Sometimes writes 0 instead of the client port for the TCP interface ?

tavplubix commented 1 year ago

Can you please tell me how I can reproduce the problem: Sometimes writes 0 instead of the client port for the TCP interface ?

@Demilivor, see https://github.com/ClickHouse/ClickHouse/pull/37303#issuecomment-1130185913 and https://github.com/ClickHouse/ClickHouse/blob/dbde63d275926461a88ffab3eb5c7232f8b7eb71/tests/queries/0_stateless/01119_session_log.sql#L23

Demilivor commented 1 year ago

Thanks @tavplubix,

Unfortunately, I was not able to reproduce the issue with port = 0 or address = default, using 01119_session_log.sql. What was the reproducibility rate?

I created my own test and tried to reproduce the problem: https://github.com/arenadata/ClickHouse/tree/ADQM-980/tests/integration/test_session_log

Test runs queries similar to 01119_session_log.sql, and checks that query: select count(*) FROM system.session_log where client_port = 0 OR toString(client_address)='::ffff:0.0.0.0' returns 0 elements.

All test cases from test_session_log pass always.

Do you have any more ideas on how I can reproduce the issue?

tavplubix commented 1 year ago

What was the reproducibility rate? Do you have any more ideas on how I can reproduce the issue?

I don't know, it just failed in the CI. I guess it's important that 01119_session_log reads everything from the session_log, including rows that were written by other functional tests that were executed concurrently or during the 5 minutes time window. So an isolated integration test will not reproduce it.

Demilivor commented 1 year ago
/// FIXME it may produce different info for LoginSuccess and the corresponding Logout entries in the session log 

This behavior seems to be expected in this scenario: 1) Client logined into user 'some_user' 2) Session logged user 'some_user' from here:

void SessionLog::addLoginSuccess(const UUID & auth_id, std::optional<String> session_id, const Context & login_context, const UserPtr & login_user)

3) While the long query was executing someone called

ALTER USER 'some_user' RENAME TO 'renamed_user'

4) long query from (3) is stopped after altering user. Then SessionLog::addLogOut will log new user name 'renamed_user'

auth_id should be the same in loginSuccess and logout.

Demilivor commented 1 year ago

Figured out the code better, it does not work like I described in previous message, when we request to alter the user name, we create a copy of the data and modify it, and the previous user data with old name still remains in the session object.

clickhouse-client -q "DROP USER IF EXISTS 'my_user'" clickhouse-client -q "DROP USER IF EXISTS 'my_renamed_user'" clickhouse-client -q "CREATE USER 'my_user'" clickhouse-client -q "GRANT ALL ON . TO my_user" clickhouse-client -q "DELETE FROM system.session_log WHERE TRUE"

# TEST 1:
# Generates 2 entries:
# Login: my_user
# Logout: my_user
clickhouse-client --user 'my_user' -q "ALTER USER 'my_user' RENAME TO 'my_renamed_user'"
clickhouse-client -q "select * from system.session_log"

#test 
#Recreate my_user again
...
clickhouse-client -q "DELETE FROM system.session_log WHERE TRUE"|
#terminal 1 (sleep(15) does not work without code changes (sleep max time hardcoded to 3 seconds))
clickhouse-client -q "SELECT sleep(15), 1"
#terminal 2
clickhouse-client --user 'my_user' -q "ALTER USER 'my_user' RENAME TO 'my_renamed_user'"
# terminal 1 session generates 2 entries:
# Login: my_user
# Logout: my_user
clickhouse-client -q "select * from system.session_log"
Demilivor commented 1 year ago

@tavplubix Good day Alexander!

Do you have any clue how can I reproduce that? Also it looks like it may produce different info for LoginSuccess and the corresponding Logout entries (it's hard to understand from where it takes client info

I have not yet found a place where data from client info that is written to the table could be modified before log out.

Demilivor commented 1 year ago

Created a pull request: #52174 It should code reliability for this: Buffering makes it less reliable (some entries may be lost if the server crashes or gets OOM) This pull request gives the possibility of reserving memory for the log entries at the start. Logs max size, reserved size, and flush threshold are server-configurable values. And adds protection from logs being lost if an unexpected crash happens. Which logs should be saved is configurable in server configuration.

I'm going to investigate test 01747_system_session_log_long.sh issues and fix them.

tavplubix commented 1 year ago

Do you have any clue how can I reproduce that?

Also it looks like it may produce different info for LoginSuccess and the corresponding Logout entries (it's hard to understand from where it takes client info

Not really. My concern is that getClientInfo() may take client info from session_context or from prepared_client_info, but the initialization/destruction order is absolutely unclear. Are session_context->getClientInfo() and prepared_client_info consistent? Is it possible that we take client info from prepared_client_info on login (because session_context is not initialized yet), but then we take it from session_context on logout? Why not to take it from one place? (all these questions indicate poor quality of the code)

Demilivor commented 1 year ago

Thanks for the answer,

I also thought about destroying the session object and how unitialized prepared_client_info can move to the query context,

Also I was thinking about this:

session_log->addLoginSuccess(
        auth_id,
        named_session ? std::optional<std::string>(named_session->key.second) : std::nullopt,
        *query_context, //  <-- loginSuscces event takes client_info from query_context
        user);

session_log->addLogOut(auth_id, user, getClientInfo()); // <-- addLogOut takes client_info from session_context or prepared_client_info 
// Maybe it would be better to always take client_info from query_context, or even better
// call addLoginSuccess in 'makeSessionContext' and use session_context->getClientInfo()

but I haven't found a way to get wrong logs in behaviour.

Demilivor commented 1 year ago

I partially fixed ClickHouse/tests/queries/bugs/01747_system_session_log_long.sh in  #52640  (Now test passes on my PC and the results look quite valid).

Now I want to add more tests related to the session log. @tavplubix Please share test ideas if you have any.

alexey-milovidov commented 1 year ago

One question - is everything in the session_log covered by the text_log? If not - can we add it to the text_log?

Demilivor commented 1 year ago

One question - is everything in the session_log covered by the text_log?

Text log messages do not match exactly with session log. see https://pastila.nl/?002d6077/03c9d8f7f198ebc2aa443dc078b320e3

If not - can we add it to the text_log?

It should be pretty easy to add corresponding messages to the session log.

alexey-milovidov commented 1 year ago

Yes, we need the same events (connection, authentication success/failure) represented in the text log. Not necessarily with the same fields, but with the important info. The goal - to make text_log (and the usual logs in the filesystem) be usable for audit as well.

Demilivor commented 1 year ago

Created https://github.com/ClickHouse/ClickHouse/pull/52958 potentially it should fix missmatching between Login and Logout entries. Still I did not reproduced the issue with port 0 and empty address ::

Demilivor commented 1 year ago

Created minor PR adding missed logout and authentication failure (for MySQL and PostgreSQL) logs to the text_log https://github.com/ClickHouse/ClickHouse/pull/53142

alexey-milovidov commented 1 month ago

Not fixed!

alexey-milovidov commented 1 month ago

https://s3.amazonaws.com/clickhouse-test-reports/0/5e836bc20e1b1da2629e412170f5bb6265936b60/stateless_tests__asan__[1_2].html

alexey-milovidov commented 1 month ago

@Demilivor, @rvasin, @Enmk, test has failed!