Closed smith closed 2 days ago
Pinging @elastic/obs-ux-infra_services-team (Team:obs-ux-infra_services)
This would be a good discussion. The goal with the 'log rate' and the log 'error rate' was to give the user an indication of throughput/traffic vs errors so figuring out how to try and present these metrics to the user in a consistent way between APM and logs-only services would be valuable.
Keen to hear people's thoughts.
I'd also like to see what we have running in practice with some customers if possible (or at the least, present to them and ask for their thoughts on the experience).
@smith @iblancof @cauemarcondes
After a few calls this week with customers, the consensus is that:
We don't need to put additional filters for log.level
on there in the Log Rate Chart (they're happy for the logs from APM to be counted too)
We should change the Log Error % to be Log Error Rate (so same as log rate but just for errors)
Log Rate : count() / [PERIOD_IN_MINUTES]
Log Error Rate : count(kql='log.level: "error" OR log.level: "ERROR" OR error.log.level : "error"') / [PERIOD_IN_MINUTES]
Any thoughts/opinions?
I'd love to make these changes in the service inventory and the service views as part of the current milestone for services if possible.
Hi @roshan-elastic,
I already added the spec in [APM][Services]Change log charts from Log Error % to Log Error Rate on overview #190987.
Regarding the service inventory column change I just created https://github.com/elastic/kibana/issues/191253, feel free to move it under a public epic or milestone if needed!
ℹ️ I mistakenly created the issue in the private repo, but I've already closed it.
Wkd @iblancof
Do you think we could create an issue to change the log rate to remove the need for log.level
to be included?
Also, do we think we should change these on the service inventory in the same issue or should we break these out into separate issues?
Hi @roshan-elastic,
In summary, we want the Log Rate for both the inventory and the overview charts to not require having the log.level field populated as mandatory.
As for creating a separate issue for the service inventory and the overview charts, I think we can handle both things within this same issue we're discussing in, right?
My question is about the priority. Since we're focused on the adaptive view, when should we tackle this? Should it be part of the epic? If so, we should include it to clarify priorities and ensure everything is delivered together, even though the scope is different.
Hey @iblancof
In summary, we want the Log Rate for both the inventory and the overview charts to not require having the log.level field populated as mandatory.
Correct!
My question is about the priority. Since we're focused on the adaptive view, when should we tackle this?
Could we do this after we've finished the service view? Goal is that when customers see these new adaptive views - we have consistent log metrics in both the service inventory and the service view.
Going to add this issue as related to the Epic so we make sure we don' t forget about it.
The action here is not removing the log.level
, but rather what other field should we use, am I right? We currently use the log.level
to define the log error rate. And we do it because AFAICT there's no standard log-level field.
Log Error Rate : count(kql='log.level: "error" OR log.level: "ERROR"') / [PERIOD_IN_MINUTES]
Like the suggested changes is still relying on the log.level field.
Hey @cauemarcondes,
Just to play back my understanding of the issue:
Log Rate We would basically count all logs:
count() / [PERIOD_IN_MINUTES]
This one wouldn't need log.level
at all.
Log Error Rate Instead of showing a %, we'd show a 'rate' and also update the error to look at:
count(kql='log.level: "error" OR log.level: "ERROR" OR error.log.level : "error"') / [PERIOD_IN_MINUTES]
This one would need either log.level
or error.log.level
(done via APM I believe) to be set.
Make sense?
Yes, that makes sense. Now I'm reading the comments again and I got what you meant. Sorry for the noise.
All good @cauemarcondes. Sorry, this whole thing has been a bit noisy because we're updating these based on user feedback we've only just received in the last week or so.
@roshan-elastic We should also change the EEM data transforms as they also expect the log.level
to exist to calculate the logRate
. cc: @simianhacker / @klacabane
One issue is that an unfiltered count()
would include the apm metrics documents so we still need to either identify what is a log or a metrics in the source indices documents. I think excluding metrics would be the easiest (eg metricset.name != "service_transaction" or metricset.name != "service_summary"
)
Hey @klacabane - would it include all of them if we specified to only match those from logs-*
?
It's OK if they're APM logs writing to logs-apm-*
etc.
would it include all of them if we specified to only match those from logs-*?
@roshan-elastic Yes, I think we have these options:
I think option 2 would give the most accurate results
Cheers @klacabane.
I think we should make this as simple as possible for users to comprehend so I'm not sure about adding in exclusions vs specifying a simple inclusion.
As a user, I'm thinking that this metric would be calculated in the same way as this lens visualisation:
Filtering for logs
Are we able to replicate this logic?
cc @iblancof as we'll want the same logic for showing the counts that we use in the service views
@klacabane should we create a ticket to update the logRate
in the service entity definition ?
Happy to make the changes if you can provide the best approach to filter out the logs.
@kpatticha please feel free to update the definition. As for the best way to get the logRate I listed different approaches and I think both 1 and 2 should provide the results you're looking for but I didn't test that
Hey @smith, as I mentioned in this comment, I don't believe this issue is resolved by PR #191962.
PR #191962 focuses on modifying the Log Error Rate chart and formula, while this issue is about ensuring that Logs are not filtered by log.level
in the Log Rate.
@iblancof is right.
I'm assigning this to myself to update the entity definition and the query accordingly
I'm also removing the N/A badge when the log.level
is not defined since it won't be valid any more
FYI @roshan-elastic
Hey @smith, as I mentioned in https://github.com/elastic/kibana/pull/191962#issuecomment-2328136920, I don't believe this issue is resolved by PR https://github.com/elastic/kibana/pull/191962.
Sorry I missed that @iblancof. Thanks for looking!
189389 specifies that we should hide logs that don't have a
log.level
attribute:log.level
is not marked as a required field in the Elastic Common Schema Log Fields specification.log.level
is also not present in OpenTelemetry logs unless the field is added in the collector.If we have a reason to filter out logs from APM errors (I'm not sure we do, other than "noise") we should explicitly filter out APM errors instead of requiring an arbitrary that happens to match.
✔️ Acceptance criteria
log.level
.