SigNoz / signoz

SigNoz is an open-source observability platform native to OpenTelemetry with logs, traces and metrics in a single application. An open-source alternative to DataDog, NewRelic, etc. 🔥 🖥. 👉 Open source Application Performance Monitoring (APM) & Observability tool
https://signoz.io
Other
18.57k stars 1.19k forks source link

Flatten out the resource attributes by default #5481

Open ankitnayan opened 2 months ago

ankitnayan commented 2 months ago

We can flatten out all or common resource attributes by default. This would save the storage and the need to create a materialised column to improve query performance by reading a column rather than an array

Want to hear some thoughts

raj-k-singh commented 2 months ago

I am assuming flattening out here refers to having dedicated columns.

Since there can be an arbitrary number of resource attributes that keep evolving over time, flattening out all of them always might lead to bloat in the number of columns - though it may not be prohibitively bad since clickhouse is column oriented and designed to deal with 100s of columns easily

Don't see any issues with having dedicated columns for commonly used resource attributes

A dedicated column for a resource attribute should help if that attribute gets used in the filter clause or is used for grouping and aggregation, since it would avoid reading the large array column However, if we go with the fingerprints based approach, in principle we no longer have to read the resource attributes array for the filter clause. And I wonder if resource attributes are used often for grouping and aggregation. So the benefits of having dedicated columns might not get used if we choose to go with resource fingerprints based tables.

srikanthccv commented 2 months ago

However, if we go with the fingerprints based approach, in principle we no longer have to read the resource attributes array for the filter clause

Can you help me understand why we don't have to read the resource attributes array?

And I wonder if resource attributes are used often for grouping and aggregation

I am your audience and I use it very often. I have seen our customers use it too. All alerts (on any signal) are created on a per resource basis all the time except in very few cases where people write queries for specific resources of interest.

raj-k-singh commented 2 months ago

Can you help me understand why we don't have to read the resource attributes array?

On further thought, you are right. Since the fingerprint relies on hash values, it doesn't guarantee that we can avoid doing an actual comparison on the resource array

ankitnayan commented 2 months ago

Since the fingerprint relies on hash values, it doesn't guarantee that we can avoid doing an actual comparison on the resource array

Not clear, this search would be on the fingerprint table and not on the main logs data table, right?