SigNoz / signoz

SigNoz is an open-source observability platform native to OpenTelemetry with logs, traces and metrics in a single application. An open-source alternative to DataDog, NewRelic, etc. 🔥 🖥. 👉 Open source Application Performance Monitoring (APM) & Observability tool
https://signoz.io
Other
17.62k stars 1.1k forks source link

Remove the default exists condition for group by keys? #5136

Open srikanthccv opened 1 month ago

srikanthccv commented 1 month ago

Bug description

For every group by clause, we are adding a condition to make sure that the key exists https://github.com/SigNoz/signoz/blob/03838f5fcca33dd35c084f6ad6fad2c722156757/pkg/query-service/app/logs/v3/query_builder.go#L208-L218. This might not be correct in all cases. Say there are two keys service.name and ecs.service.name which are mutually exclusive i.e a log record has either one of them in attributes but not both. If I group by service.name and ecs.service.name it could return no result. Perhaps we should not add key exists condition ourselves and let the user do it if necessary.

nityanandagohain commented 1 month ago

So you are saying that if the user doesn't explicitly use the exists then empty values is fine ?

So are with fine with:- If there I group by service.name. And there are n logs with no service name then count of n will be added to empty service name. So it will be difficult to differentiate between empty service name and logs where service.name doesn't exists.

srikanthccv commented 1 month ago

So it will be difficult to differentiate between empty service name and logs where service.name doesn't exists.

In practice, this doesn't happen where service.name (or any attribute) is deliberately set to empty, and it should be distinguished from the attribute's existence. On occasions when it is important, users will need to add the EXISTS filter.

So you are saying that if the user doesn't explicitly use the exists then empty values is fine ?

The current result doesn't display anything and just discards the data, as we observed when trying to group by service.name (the total of each value in the group by doesn't match the total number of log lines). Therefore, showing an empty result is accurate and preferable.

srikanthccv commented 5 days ago

This should be fixed otherwise the charts are just lying.

nityanandagohain commented 2 days ago

sure prioritising this

srikanthccv commented 1 day ago

So it will be difficult to differentiate between empty service name and logs where service.name doesn't exists.

In practice, this doesn't happen where service.name (or any attribute) is deliberately set to empty, and it should be distinguished from the attribute's existence. On occasions when it is important, users will need to add the EXISTS filter.

This is a lazy assumption to make for any data type specifically expecting users to add the filter.