Open afharo opened 3 years ago
Pinging @elastic/kibana-core (Team:Core)
+1 on this. We can add an explicit flag to the configs and default it to true
if env.dev
+1
I think we should aim to explicitly ignore expected errors though. E.g. reading a saved object can timeout when there's network instability, if we know this call will be repeated at some point in the future, we should just always ignore this error and not print any logs, not even a developer needs to see that it happened, it's part of the expected behaviour of the plugin.
Then for the scenarios we're not sure we're handling correctly or that we don't know if we can recover from we can log in development.
-1 wait, why would we have network instability when running Elasticsearch and Kibana on the same host?
@LeeDr Elasticsearch can be non-responsive for multiple reasons even on the same host. Out of memory/space, node is restarting, kibana index is locked or misconfigured, etc.
Our telemetry collectors are usually the first to scream when kibana is unable to reach ES for any reason. This usually means that customers open tickets that telemetry is causing kibana to fail although telemetry was only the first plugin to log an error in the server logs.
If we silence the telemetry/collection logs on production or put them behind a config we'd be avoiding these situations which helps users identify the real cause of the issue.
I'm assuming users would try to disable telemetry as a first step to debug such cases although the root cause is not related to telemetry, hence we'd be missing out from receiving usage from these clusters.
For implementation details: the POC https://github.com/elastic/kibana/pull/95960 had it implemented. I think that the effort would be to:
telemetry
and usageCollection
plugins.debug
logs and set them to their appropriate level.@pjhampton added an important point: We should set Cloud to enable these logs as well by default so we can catch any potential bugs in those controlled environments.
Follow up to #89588.
We are constantly switching
warn
/error
logs in the usage-collection/telemetry plugins todebug
to avoid making noise to our end-users. It's an easy way to silence those errors. However, it feels to me like debug errors can make it harder to identify any possible issues we may introduce (they are not noisy enough).How about we configure the loggers for the UsageCollection & Telemetry plugins to be silent when in production mode, but properly warn when in dev mode?