elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.72k stars 8.14k forks source link

Kibana authentication troubleshooting guide #83914

Open azasypkin opened 3 years ago

azasypkin commented 3 years ago

Kibana authentication sub-system includes quite a bit of functionality these days and it's not always easy to troubleshoot problems or misconfiguration in this area. This issue is intended to gather the most common issues our users are experiencing with our authentication layer and ideas on how we can help them to troubleshoot these issues.

We can tackle this from two different angles:

Most frequent issues

Inconsistent (autogenerated) xpack.security.encryptionKey in Kibana HA setup

This is by far the most common source of confusion. If one instance of Kibana cannot decrypt cookie that was created by another instance the cookie will be cleared.

What we already do:

What we can do:

Inconsistent session and authentication settings in Kibana HA setup

Every instance of Kibana schedules a regular session cleanup job to remove sessions that weren't explicitly invalidated. There are number of criteria we use to determine that session can be safely removed, but the most notable are:

That means that if multiple Kibana instances that rely on the same .kibana-x index have different session or providers settings then a cleanup job scheduled by one Kibana instance may invalidate sessions created by another instance. By default, a cleanup job is run on startup and every hour after that, so users may experience sporadic logouts that may be hard to debug.

What we already do:

What we can do:

Multi-tenancy using the same host name, but different ports

Per RFC6265 cookies for a given host are shared across all the ports on that host, even though the usual "same-origin policy" used by web browsers isolates content retrieved via different ports. That means that if you have multiple Kibana tenants (Kibana instances that use different .kibana-x indices) that are using the same host name, but different ports then the session cookies will be shared between them.

This will lead to sporadic logouts if both tenants are opened in the same browsing context (same browser window) since if one tenant receives a session cookie that references to a session that lives in another tenant then the cookie will be treated as invalid and Kibana will clear it.

The most correct solution is to never host different applications on the same hostname because of a cookie leak. If that's not possible then the workaround is to configure different session cookie names for every tenant with xpack.security.cookieName setting.

What we already do:

What we can do:

Multiple authentication providers without Login Selector

It's still possible to use multiple authentication providers even if Login Selector is disabled. The support is very limited though and we generally discourage our users from that setup. The main reason why we still support this is BWC. There is nothing we can do here, so I'll just outline few notable thing about this setup:

What we can do:

Kibana session settings vs access/refresh token expiration

Many of the Kibana authentication providers use Elasticsearch access/refresh tokens under the hood: SAML, OpenID Connect, PKI, Kerberos and Token. And these tokens also have their own expiration settings, that are separate from Kibana's own session expiration settings:

If Kibana's session idle timeout is higher than the expiration time of the underlying access token Kibana will automatically refresh the access token once user becomes active again. But if admin disabled or set Kibana session idle timeout or lifespan higher than 24 hours and user isn't active during this period then underlying refresh token expires and access token cannot be refreshed anymore. Such setup effectively limits Kibana session timeouts to 24 hours.

For example, if Kibana is configured to work with one of the token based authentication providers, and admin wants to disable idle timeout they would do something like this:

xpack.security.session.idleTimeout: 0

But in reality, because of hard-coded 24 hours lifetime of the refresh token, idle timeout will be approximately equal to only 24 hours.

It's even more problematic for the PKI authentication since Elasticsearch doesn't provide refresh token in this case at all effectively limiting idle timeout for the PKI authentication provider to the lifetime of the access token (max 1 hour).

What we already do:

What we can do:

Misconfigured role mappings

It's more of an Elasticsearch issue, but it's usually Kibana where user is finally stuck, so we can try to help to debug this.

What we already do:

What we can do:

Misconfigured refresh interval for the security-related indices

Security-related indices (and many other system indices) are very sensible to refresh intervals higher than 1s as most update operations are issued with a wait_for_refresh in order to guarantee concurrent edits.

Changing default refresh intervals for the security-related indices is highly discouraged. Typical causes of this are match all index templates which set some common settings or mappings to all indices, or if user mistakenly sets a common refresh interval to ALL indices.

note This should happen less frequently once https://github.com/elastic/kibana/pull/134900 merges (target 8.4.0).

This can lead to significant delays and failures during request authentication. To make sure the security-related indices have proper refresh intervals, you can check settings file in the Elastic support diagnostics bundle:

misconfig

@elastic/kibana-security I'll be gradually filling this issue with info I remember, but please feel free to comment here or edit issue description to include issues you know about that I missed.

azasypkin commented 3 years ago

Okay, I described all the cases I could remember so far. I'll get back to this issue in a few weeks so that everyone has time to share any other ideas/issues.

legrego commented 3 years ago

Thanks for putting this together! I agree with a lot of what you said here, and I don't see any glaring omissions.

Multiple authentication providers without Login Selector

Would it be possible to use the new auth_provider_hint query parameter to attempt authentication?

Discourage, discourage, discourage and eventually deprecate

As much as I'd love to deprecate this, I worry that we will end up having to support this in some capacity.

azasypkin commented 3 years ago

Would it be possible to use the new auth_provider_hint query parameter to attempt authentication?

Yeah, it should allow you to pick any provider.

As much as I'd love to deprecate this, I worry that we will end up having to support this in some capacity.

Right, my suspicion is that many users upgrade Kibana and just keep their legacy authc config and hence don't leverage Login Selector by default. And right now our Telemetry cannot tell us whether it's the case or users explicitly disabled Login Selector. In 8.0.0 when we drop legacy config completely we'll be able to see how many users explicitly disable it.

aniketpant1 commented 1 year ago

We are currently encountering with this issue. Recently we have integrated Azure AD OIDC realms for authentication in elasticsearch and kibana. Our end users who is using kibana is frequently logging out within 5 minutes 1 hour . They need to re-login again providing email id and password and passcode. If we disable/commented session settings from kibana.yml file it is still logging out. We have escalated this issue to elastic engineers. We couldn't able to find what causing this frequent logout . Besides of setting xpack.security.session.idleTimeout: "15m" xpack.security.session.lifespan: "24h" it is logging out.

azasypkin commented 1 year ago

We have escalated this issue to elastic engineers.

If you've escalated this issue to our support team, we'll look into it soon. If you don't have access to our support, then please post this question at our Discuss forum. There much more users like you that can help and probably already solved the problem you have. The GitHub issue isn't the right place to debug issues like that.

Having said that, I'm almost sure that you have multiple Kibana instances connected to the same cluster that have different security configurations or something along these lines: https://www.elastic.co/guide/en/kibana/current/production.html#load-balancing-kibana