Open ankitnayan opened 2 months ago
I am assuming flattening out here refers to having dedicated columns.
Since there can be an arbitrary number of resource attributes that keep evolving over time, flattening out all of them always might lead to bloat in the number of columns - though it may not be prohibitively bad since clickhouse is column oriented and designed to deal with 100s of columns easily
Don't see any issues with having dedicated columns for commonly used resource attributes
A dedicated column for a resource attribute should help if that attribute gets used in the filter clause or is used for grouping and aggregation, since it would avoid reading the large array column However, if we go with the fingerprints based approach, in principle we no longer have to read the resource attributes array for the filter clause. And I wonder if resource attributes are used often for grouping and aggregation. So the benefits of having dedicated columns might not get used if we choose to go with resource fingerprints based tables.
However, if we go with the fingerprints based approach, in principle we no longer have to read the resource attributes array for the filter clause
Can you help me understand why we don't have to read the resource attributes array?
And I wonder if resource attributes are used often for grouping and aggregation
I am your audience and I use it very often. I have seen our customers use it too. All alerts (on any signal) are created on a per resource basis all the time except in very few cases where people write queries for specific resources of interest.
Can you help me understand why we don't have to read the resource attributes array?
On further thought, you are right. Since the fingerprint relies on hash values, it doesn't guarantee that we can avoid doing an actual comparison on the resource array
Since the fingerprint relies on hash values, it doesn't guarantee that we can avoid doing an actual comparison on the resource array
Not clear, this search would be on the fingerprint table and not on the main logs data table, right?
We can flatten out all or common resource attributes by default. This would save the storage and the need to create a materialised column to improve query performance by reading a column rather than an array
Want to hear some thoughts