Open yawnt opened 3 months ago
That sounds good! support!
@tomershafir Thank you for that, I was indeed aware of Qryn but it's a bit different because it is a separate product that needs to be installed and operated in addition to ClickHouse. I was suggesting a more native integration given the Otel focus on profiling.
Use case
Efficiently querying continuous profiling data in ClickHouse to drive flamegraph visualization would be a great addition to the logging, metrics and traces use case - especially in light of OpenTelemetry's decision to officially support it
Describe the solution you'd like
flameGraph
function to support bigger trees, non-system tables and different output formats would also achieve similar resultsDescribe alternatives you've considered
I've loaded up around ~3.29 billion profiler samples (~30 days) in a ClickHouse instance to generate flamegraphs. There are a few visualizers for flamegraphs (
speedscope
,flamegraph.pl
), but I focused on Grafana's flame graph panel because:flamegraph.pl
: it's a depth-first traversal of the flamegraph with 4 fields:label
,level
,total
, andself
- withself
being the cumulative sum of the box minus the cumulative sum of its childrensvg
I tried three different approaches:
Pure SQL
Works quite well actually! Starts to struggle once we hit a few days of profiling data because of the explosion caused by the `arrayJoin` and the `ORDER BY` and `GROUP BY` are high-cardinality due to the unique sub paths. ```SQL CREATE TABLE profilerSamples( timestamp DateTime, sampleCount UInt64, stackFrames Array(String), -- Could be a UUID / UInt128 and joined via a dictionary label1 String, label2 String ) ENGINE = MergeTree() -- Could be a Summing/AggregatingMergeTree ORDER BY (label1, label2, toStartOfMinute(timestamp)) ``` ```SQL WITH groupedProfilerSamples AS ( -- Reduces number of rows by grouping same stack frames together SELECT stackFrames, sum(sampleCount) as sampleCount FROM profilerSamples WHERE timestamp BETWEEN 'flameGraph Aggregate Function
I've tried running the `flameGraph` function against a modified table similar to the one detailed under "Pure SQL", but the fact that it stores children as a linked list means it slows down as the tree size grows. The `flamegraph.pl` format is also quite verbose due to the repetition of stack frames.Executable Table Function
I've written a small C++ program which reads the equivalent of the `groupedProfilerSamples SELECT` query from "Pure SQL", ingests it into a tree structure which uses a map to hold children and prints the tree to `STDOUT` in depth-first order. It also works quite well when I try to read data exported via `INTO OUTFILE` in `RowBinary` format, but I've hit issue https://github.com/ClickHouse/ClickHouse/issues/66646 when running it directly inside ClickHouse (meaning my `GROUP BY` that usually takes ~25 seconds never actually seem to manage to complete).I'm curious whether I've been doing something inefficiently that can be optimized and / or to see if there is any interest in such an optimisation.
Thanks a lot!
PS: I did not know if I should have tagged the issue as RFC?