azat / chdig

Dig into ClickHouse with TUI interface.
MIT License
123 stars 13 forks source link

chdig 0.6.1 (mac m1) does not work for clickhouse versions 22.8.9.24 and 21.2.9.41 #33

Closed Dolso closed 8 months ago

Dolso commented 8 months ago

Good afternoon!

I have a question, for which minimum version of clickhouse is the tool supported? Actually version 22.8.9.24 is more interesting than 21.2.9.41

If we look at these two versions separately:

22.8.9.24

The tool manages to work only for a few seconds and then closes (Attached screenshot). Judging by system.query_log, chdig has time to make only one query (successful) (added query below the screenshot) and its query_duration_ms is about 29 ms

Screenshot 2024-01-04 at 16 07 59

WITH
    -- memory detalization
    (SELECT sum(value::UInt64) FROM system.metrics WHERE metric = 'MemoryTracking')               AS memory_tracked_,
    (SELECT sum(total_bytes) FROM system.tables WHERE engine IN ('Join','Memory','Buffer','Set')) AS memory_tables_,
    (SELECT sum(value::UInt64) FROM system.asynchronous_metrics WHERE metric LIKE '%CacheBytes')  AS memory_caches_,
    (SELECT sum(memory_usage::UInt64) FROM system.processes)                                      AS memory_processes_,
    (SELECT count() FROM system.processes)                                                        AS processes_,
    (SELECT sum(memory_usage::UInt64) FROM system.merges)                                         AS memory_merges_,
    (SELECT sum(bytes_allocated) FROM system.dictionaries)                                        AS memory_dictionaries_,
    (SELECT sum(primary_key_bytes_in_memory_allocated) FROM system.parts)                         AS memory_primary_keys_,
    (SELECT count() FROM system.one)                                                              AS servers_,
    (SELECT count() FROM system.merges)                                                           AS merges_,
    (SELECT count() FROM system.mutations WHERE NOT is_done)                                      AS mutations_,
    (SELECT count() FROM system.replication_queue)                                                AS replication_queue_,
    (SELECT sum(num_tries) FROM system.replication_queue)                                         AS replication_queue_tries_,
    (SELECT count() FROM system.replicated_fetches)                                                          AS fetches_
SELECT
    assumeNotNull(servers_)                                  AS servers,
    assumeNotNull(memory_tracked_)                           AS memory_tracked,
    assumeNotNull(memory_tables_)                            AS memory_tables,
    assumeNotNull(memory_caches_)                            AS memory_caches,
    assumeNotNull(memory_processes_)                         AS memory_processes,
    assumeNotNull(processes_)                                AS processes,
    assumeNotNull(memory_merges_)                            AS memory_merges,
    assumeNotNull(merges_)                                   AS merges,
    assumeNotNull(mutations_)                                AS mutations,
    assumeNotNull(replication_queue_)                        AS replication_queue,
    assumeNotNull(replication_queue_tries_)                  AS replication_queue_tries,
    assumeNotNull(fetches_)                                  AS fetches,
    assumeNotNull(memory_dictionaries_)                      AS memory_dictionaries,
    assumeNotNull(memory_primary_keys_)                      AS memory_primary_keys,

    asynchronous_metrics.*,
    metrics.*
FROM
(
    WITH
        -- exclude MD/LVM
        metric LIKE '%_sd%' OR metric LIKE '%_nvme%' AS is_disk,
        metric NOT LIKE '%vlan%' AS is_vlan
    -- NOTE: cast should be after aggregation function since the type is Float64
    SELECT
        minIf(value, metric == 'OSUptime')::UInt64               AS os_uptime,
        min(uptime())::UInt64                                    AS uptime,
        -- memory
        sumIf(value, metric == 'OSMemoryTotal')::UInt64          AS os_memory_total,
        sumIf(value, metric == 'MemoryResident')::UInt64         AS memory_resident,
        -- cpu
        countIf(metric LIKE 'OSUserTimeCPU%')::UInt64            AS cpu_count,
        sumIf(value, metric LIKE 'OSUserTimeCPU%')::UInt64       AS cpu_user,
        sumIf(value, metric LIKE 'OSSystemTimeCPU%')::UInt64     AS cpu_system,
        -- threads detalization
        sumIf(value, metric = 'HTTPThreads')::UInt64             AS threads_http,
        sumIf(value, metric = 'TCPThreads')::UInt64              AS threads_tcp,
        sumIf(value, metric = 'OSThreadsTotal')::UInt64          AS threads_os_total,
        sumIf(value, metric = 'OSThreadsRunnable')::UInt64       AS threads_os_runnable,
        sumIf(value, metric = 'InterserverThreads')::UInt64      AS threads_interserver,
        -- network
        sumIf(value, metric LIKE 'NetworkSendBytes%' AND NOT is_vlan)::UInt64    AS net_send_bytes,
        sumIf(value, metric LIKE 'NetworkReceiveBytes%' AND NOT is_vlan)::UInt64 AS net_receive_bytes,
        -- block devices
        sumIf(value, metric LIKE 'BlockReadBytes%' AND is_disk)::UInt64      AS block_read_bytes,
        sumIf(value, metric LIKE 'BlockWriteBytes%' AND is_disk)::UInt64     AS block_write_bytes,
        -- update intervals
        anyLastIf(value, metric == 'AsynchronousMetricsUpdateInterval')::UInt64 AS metrics_update_interval
    FROM system.asynchronous_metrics
) as asynchronous_metrics,
(
    SELECT
        sumIf(value::UInt64, metric == 'StorageBufferBytes') AS storage_buffer_bytes,
        sumIf(value::UInt64, metric == 'DistributedFilesToInsert') AS storage_distributed_insert_files,

        sumIf(value::UInt64, metric == 'BackgroundMergesAndMutationsPoolTask')    AS threads_merges_mutations,
        sumIf(value::UInt64, metric == 'BackgroundFetchesPoolTask')               AS threads_fetches,
        sumIf(value::UInt64, metric == 'BackgroundCommonPoolTask')                AS threads_common,
        sumIf(value::UInt64, metric == 'BackgroundMovePoolTask')                  AS threads_moves,
        sumIf(value::UInt64, metric == 'BackgroundSchedulePoolTask')              AS threads_schedule,
        sumIf(value::UInt64, metric == 'BackgroundBufferFlushSchedulePoolTask')   AS threads_buffer_flush,
        sumIf(value::UInt64, metric == 'BackgroundDistributedSchedulePoolTask')   AS threads_distributed,
        sumIf(value::UInt64, metric == 'BackgroundMessageBrokerSchedulePoolTask') AS threads_message_broker,
        sumIf(value::UInt64, metric IN (
            'BackupThreadsActive',
            'RestoreThreadsActive',
            'BackupsIOThreadsActive'
        )) AS threads_backups,
        sumIf(value::UInt64, metric IN (
            'DiskObjectStorageAsyncThreadsActive',
            'ThreadPoolRemoteFSReaderThreadsActive',
            'StorageS3ThreadsActive'
        )) AS threads_remote_io,
        sumIf(value::UInt64, metric IN (
            'IOThreadsActive',
            'IOWriterThreadsActive',
            'IOPrefetchThreadsActive',
            'MarksLoaderThreadsActive'
        )) AS threads_io,
        sumIf(value::UInt64, metric IN (
            'QueryPipelineExecutorThreadsActive',
            'QueryThread',
            'AggregatorThreadsActive',
            'StorageDistributedThreadsActive',
            'DestroyAggregatesThreadsActive'
        )) AS threads_queries
    FROM system.metrics
) as metrics
SETTINGS enable_global_with_statement=0

21.2.9.41

The tool runs 2 queries at some intervals that cannot execute due to syntax or because some columns are missing in the old version of clickhouse

Screenshot 2024-01-04 at 15 30 46

First query:

SELECT
    if(is_initial_query,
        (sumMap(ProfileEvents) OVER (PARTITION BY initial_query_id)),
        ProfileEvents
    ) AS ProfileEvents
    ,
    Settings,
    thread_ids,
    peak_memory_usage,
    elapsed / 1 AS elapsed,
    user,
    (count() OVER (PARTITION BY initial_query_id)) AS subqueries,
    is_initial_query,
    initial_query_id,
    query_id,
    hostName() AS host_name,
    current_database,
    (now64() - elapsed) AS query_start_time_microseconds,
    toValidUTF8(query) AS original_query,
    normalizeQuery(query) AS normalized_query
FROM system.processes
LIMIT 10000

Exception: Code: 47, e.displayText() = DB::Exception: Missing columns: 'ProfileEvents' 'current_database' 'Settings' while processing query: 'SELECT if(is_initial_query, sumMap(ProfileEvents) OVER (PARTITION BY initial_query_id), ProfileEvents) AS ProfileEvents, Settings, thread_ids, peak_memory_usage, elapsed / 1 AS elapsed, user, count() OVER (PARTITION BY initial_query_id) AS subqueries, is_initial_query, initial_query_id, query_id, hostName() AS host_name, current_database, now64() - elapsed AS query_start_time_microseconds, toValidUTF8(query) AS original_query, normalizeQuery(query) AS normalized_query FROM system.processes LIMIT 10000', required columns: 'peak_memory_usage' 'user' 'Settings' 'thread_ids' 'is_initial_query' 'initial_query_id' 'current_database' 'ProfileEvents' 'query' 'query_id' 'elapsed' (version 21.2.9.41 (official build))

Second query:

WITH
    -- memory detalization
    (SELECT sum(value::UInt64) FROM system.metrics WHERE metric = 'MemoryTracking')               AS memory_tracked_,
    (SELECT sum(total_bytes) FROM system.tables WHERE engine IN ('Join','Memory','Buffer','Set')) AS memory_tables_,
    (SELECT sum(value::UInt64) FROM system.asynchronous_metrics WHERE metric LIKE '%CacheBytes')  AS memory_caches_,
    (SELECT sum(memory_usage::UInt64) FROM system.processes)                                      AS memory_processes_,
    (SELECT count() FROM system.processes)                                                        AS processes_,
    (SELECT sum(memory_usage::UInt64) FROM system.merges)                                         AS memory_merges_,
    (SELECT sum(bytes_allocated) FROM system.dictionaries)                                        AS memory_dictionaries_,
    (SELECT sum(primary_key_bytes_in_memory_allocated) FROM system.parts)                         AS memory_primary_keys_,
    (SELECT count() FROM system.one)                                                              AS servers_,
    (SELECT count() FROM system.merges)                                                           AS merges_,
    (SELECT count() FROM system.mutations WHERE NOT is_done)                                      AS mutations_,
    (SELECT count() FROM system.replication_queue)                                                AS replication_queue_,
    (SELECT sum(num_tries) FROM system.replication_queue)                                         AS replication_queue_tries_,
    (SELECT count() FROM system.replicated_fetches)                                                          AS fetches_
SELECT
    assumeNotNull(servers_)                                  AS servers,
    assumeNotNull(memory_tracked_)                           AS memory_tracked,
    assumeNotNull(memory_tables_)                            AS memory_tables,
    assumeNotNull(memory_caches_)                            AS memory_caches,
    assumeNotNull(memory_processes_)                         AS memory_processes,
    assumeNotNull(processes_)                                AS processes,
    assumeNotNull(memory_merges_)                            AS memory_merges,
    assumeNotNull(merges_)                                   AS merges,
    assumeNotNull(mutations_)                                AS mutations,
    assumeNotNull(replication_queue_)                        AS replication_queue,
    assumeNotNull(replication_queue_tries_)                  AS replication_queue_tries,
    assumeNotNull(fetches_)                                  AS fetches,
    assumeNotNull(memory_dictionaries_)                      AS memory_dictionaries,
    assumeNotNull(memory_primary_keys_)                      AS memory_primary_keys,

    asynchronous_metrics.*,
    metrics.*
FROM
(
    WITH
        -- exclude MD/LVM
        metric LIKE '%_sd%' OR metric LIKE '%_nvme%' AS is_disk,
        metric NOT LIKE '%vlan%' AS is_vlan
    -- NOTE: cast should be after aggregation function since the type is Float64
    SELECT
        minIf(value, metric == 'OSUptime')::UInt64               AS os_uptime,
        min(uptime())::UInt64                                    AS uptime,
        -- memory
        sumIf(value, metric == 'OSMemoryTotal')::UInt64          AS os_memory_total,
        sumIf(value, metric == 'MemoryResident')::UInt64         AS memory_resident,
        -- cpu
        countIf(metric LIKE 'OSUserTimeCPU%')::UInt64            AS cpu_count,
        sumIf(value, metric LIKE 'OSUserTimeCPU%')::UInt64       AS cpu_user,
        sumIf(value, metric LIKE 'OSSystemTimeCPU%')::UInt64     AS cpu_system,
        -- threads detalization
        sumIf(value, metric = 'HTTPThreads')::UInt64             AS threads_http,
        sumIf(value, metric = 'TCPThreads')::UInt64              AS threads_tcp,
        sumIf(value, metric = 'OSThreadsTotal')::UInt64          AS threads_os_total,
        sumIf(value, metric = 'OSThreadsRunnable')::UInt64       AS threads_os_runnable,
        sumIf(value, metric = 'InterserverThreads')::UInt64      AS threads_interserver,
        -- network
        sumIf(value, metric LIKE 'NetworkSendBytes%' AND NOT is_vlan)::UInt64    AS net_send_bytes,
        sumIf(value, metric LIKE 'NetworkReceiveBytes%' AND NOT is_vlan)::UInt64 AS net_receive_bytes,
        -- block devices
        sumIf(value, metric LIKE 'BlockReadBytes%' AND is_disk)::UInt64      AS block_read_bytes,
        sumIf(value, metric LIKE 'BlockWriteBytes%' AND is_disk)::UInt64     AS block_write_bytes,
        -- update intervals
        anyLastIf(value, metric == 'AsynchronousMetricsUpdateInterval')::UInt64 AS metrics_update_interval
    FROM system.asynchronous_metrics
) as asynchronous_metrics,
(
    SELECT
        sumIf(value::UInt64, metric == 'StorageBufferBytes') AS storage_buffer_bytes,
        sumIf(value::UInt64, metric == 'DistributedFilesToInsert') AS storage_distributed_insert_files,

        sumIf(value::UInt64, metric == 'BackgroundMergesAndMutationsPoolTask')    AS threads_merges_mutations,
        sumIf(value::UInt64, metric == 'BackgroundFetchesPoolTask')               AS threads_fetches,
        sumIf(value::UInt64, metric == 'BackgroundCommonPoolTask')                AS threads_common,
        sumIf(value::UInt64, metric == 'BackgroundMovePoolTask')                  AS threads_moves,
        sumIf(value::UInt64, metric == 'BackgroundSchedulePoolTask')              AS threads_schedule,
        sumIf(value::UInt64, metric == 'BackgroundBufferFlushSchedulePoolTask')   AS threads_buffer_flush,
        sumIf(value::UInt64, metric == 'BackgroundDistributedSchedulePoolTask')   AS threads_distributed,
        sumIf(value::UInt64, metric == 'BackgroundMessageBrokerSchedulePoolTask') AS threads_message_broker,
        sumIf(value::UInt64, metric IN (
            'BackupThreadsActive',
            'RestoreThreadsActive',
            'BackupsIOThreadsActive'
        )) AS threads_backups,
        sumIf(value::UInt64, metric IN (
            'DiskObjectStorageAsyncThreadsActive',
            'ThreadPoolRemoteFSReaderThreadsActive',
            'StorageS3ThreadsActive'
        )) AS threads_remote_io,
        sumIf(value::UInt64, metric IN (
            'IOThreadsActive',
            'IOWriterThreadsActive',
            'IOPrefetchThreadsActive',
            'MarksLoaderThreadsActive'
        )) AS threads_io,
        sumIf(value::UInt64, metric IN (
            'QueryPipelineExecutorThreadsActive',
            'QueryThread',
            'AggregatorThreadsActive',
            'StorageDistributedThreadsActive',
            'DestroyAggregatesThreadsActive'
        )) AS threads_queries
    FROM system.metrics
) as metrics
SETTINGS enable_global_with_statement=0

Exception:

Code: 62, e.displayText() = DB::Exception: Syntax error: failed at position 115 (':') (line 4, col 42): ::UInt64) FROM system.metrics WHERE metric = 'MemoryTracking')               AS memory_tracked_,
                        (SELECT sum(total_bytes) FROM system.ta. Expected one of: LIKE, GLOBAL NOT IN, AS, DIV, IS, OR, QuestionMark, BETWEEN, NOT LIKE, MOD, AND, Comma, alias, IN, ILIKE, Dot, NOT ILIKE, NOT, Arrow, token, NOT IN, GLOBAL IN (version 21.2.9.41 (official build))

My device "Mac mini": CPU Apple M1 Memory 16 GB macOS Ventura 13.4

Software: chdig 0.6.1 cargo 1.75.0 PyOxidizer 0.24.0 nfpm 2.35.1

Thank you very much in advance! And I apologise if this question is too obvious or has been asked before

azat commented 8 months ago

Hi Andrei, and thank you for the report!

22.8.9.24

There was panic in chdig - fixed in https://github.com/azat/chdig/commit/92a13281e42014ad0b9c962e69b205d54b7f5bfc, and now it works.

You can use latest release - https://github.com/azat/chdig/releases/tag/latest

I've also tested 21.8 and it has some bug in ClickHouse that does not allow to use WINDOW functions.

``` 2024.01.05 10:58:09.000469 [ 55 ] {78c3fedb-871b-4582-9248-ee3ba6faa214} executeQuery: Code: 47, e.displayText() = DB::Exception: Unknown identifier: sumMap(ProfileEvents) OVER (PARTITION BY initial_query_id); there are columns: is_initial_query, user, query_id, initial_query_id, elapsed, peak_memory_usage, query, thread_ids, ProfileEvents, Settings, current_database: While processing if(is_initial_query, sumMap(ProfileEvents) OVER (PARTITION BY initial_query_id), ProfileEvents) AS profile_events, Settings, thread_ids, peak_memory_usage, elapsed / 1 AS elapsed, user, count() OVER (PARTITION BY initial_query_id) AS subqueries, is_initial_query, initial_query_id, query_id, hostName() AS host_name, current_database, now64() - elapsed AS query_start_time_microseconds, toValidUTF8(query) AS original_query, normalizeQuery(query) AS normalized_query (version 21.8.8.29 (official build)) (from 172.17.0.1:27434) (in query: SELECT if(is_initial_query, (sumMap(ProfileEvents) OVER (PARTITION BY initial_query_id)), ProfileEvents ) AS profile_events , Settings, thread_ids, peak_memory_usage, elapsed / 1 AS elapsed, user, (count() OVER (PARTITION BY initial_query_id)) AS subqueries, is_initial_query, initial_query_id, query_id, hostName() AS host_name, current_database, (now64() - elapsed) AS query_start_time_microseconds, toValidUTF8(query) AS original_query, normalizeQuery(query) AS normalized_query FROM system.processes LIMIT 10000 ), Stack trace (when copying this message, always include the lines below): ```

21.2.9.41

It has too old representation of system.query_log, and I'm pretty sure that it will have the same bug with WINDOW functions.

So to make it work this logic should be moved out from SQL query...

Dolso commented 8 months ago

Thanks a lot for the quick response! I tested a bit on clickhouse version 22.8.9.24 and I think the tool works. I think the issue can be closed

azat commented 8 months ago

Actually it does not solves the issue completely, but #34 will

P.S. I don't use "ancient" (older then ~3month) versions, so I do not test on them, but if it will not require tons of code I'm OK with supporting older versions. And patches are always welcome!