SSSD / sssd

A daemon to manage identity, authentication and authorization for centrally-managed systems.
https://sssd.io
GNU General Public License v3.0
594 stars 245 forks source link

KCM: provide mechanism to purge expired credentials #6667

Open qralston opened 1 year ago

qralston commented 1 year ago

Because KCM permits multiple credentials for the same user to be stored in the cache collection, credentials tend to accumulate. Over time, as cached credentials expire, the user’s cache collection becomes littered with duplicate credentials.

We have discovered that having duplicate expired credentials in the cache collection causes breakage. For example, ssh credential delegation can select an expired credential to delegate to the target host, even when a duplicate non-expired credential existed in the collection. In our environment, where home directories are mounted via NFSv4 with sec=krb5p, this locks the user out of their home directory, as they must either acquire or delegate a non-expired credential in order to access their home directory.

Problems like this—where a failure on a remote host is in fact being caused by issues on the local host that initiated the remote connection—are exceedingly difficult for many users to grasp.

(Issue #6357, where KCM will randomly change the primary cache in the cache collection (now fixed, but it will take a while for that fix to propagate out to distros) makes this even worse.)

User complaints have gotten bad enough that we are trying to figure out a way to throw together some sort of “poor man’s expired credential purger.” But unfortunately, sssd makes this exceedingly difficult, because sssctl provides no ability to query any aspect of KCM.

After trial and error, running this command as root:

$ tdbdump /var/lib/sss/secrets/secrets.ldb |
  grep ^key |
  tr , '\012' |
  grep -E '^CN=[[:digit:]]+$' |
  sort |
  uniq |
  cut -d= -f2 |
  xargs -e -r getent passwd |
  cut -d: -f1

…looks like it will show us the usernames of all users with credentials in KCM. From there, it should be possible to enumerate over those users via runuser and run a script to purge any expired credentials:

$ klist -l |
  awk '$3 == "(Expired)" {print $2}' |
  xargs -e -r -t -l kdestroy -c

But if sssd users have to resort to kluges like this—using third-party tools (tdbdump is a Samba utility) to dump KCM internals—in order to prevent KCM from causing breakage, it means that KCM lacks critical functionality.

Specifically: KCM needs a mechanism to automatically purge expired credentials. E.g., something like this:

krb5_expired_purge_interval (string)

The time in seconds between checks for expired credentials in KCM. When a check for expired credentials occurs, all expired credentials found in KCM, for all users except the root user, will be purged, regardless of the mechanism by which the credential was added to KCM. The value is an integer immediately followed by a time unit:

s for seconds m for minutes h for hours d for days.

If there is no unit given, s is assumed.

NOTE: It is not possible to mix units. To set the purge interval to one and a half hours, use 90m instead of 1h30m.

If this option is not set, or is set to 0, no checks for expired credentials occur. This means that expired credentials will persist in all users’ respective cache collections until manually deleted via kdestroy.

Default: not set

Note that the “regardless of the mechanism by which the credential was added to KCM” part is critical: our users frequently use kinit to stuff other credentials into their cache collections.

There is a pressing need for this: it will eliminate problems caused by other processes and services unintentionally plucking expired credentials out of the user’s cache collection, and it will prevent the secrets database from growing without bounds because expired credentials are never purged.

Please add this feature.

alexey-tikhonov commented 1 year ago

RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1900973

pbrezina commented 1 year ago

Pushed PR: https://github.com/SSSD/sssd/pull/6917

opoplawski commented 8 months ago

So, I'm curious about the actual fix implemented. The requester here seems to have asked for expired credentials to automatically be removed from the cache (which I would like to see as well). But what seems to have been implemented is that if the cache fills up the oldest expired credential will be removed to make room. This would still suggest that we will end up with large amounts of expired credentials in the cache. Is that right or am I missing something?

alexey-tikhonov commented 8 months ago

what seems to have been implemented is that if the cache fills up the oldest expired credential will be removed to make room. This would still suggest that we will end up with large amounts of expired credentials in the cache. Is that right or am I missing something?

That's right.

aplopez commented 8 months ago

You are right. What was implemented is that the oldest expired credential will be removed if a new credential needs to be added to the cache. This is the best solution considering other users want the opposite behavior.

If you want to limit the number of credentials, you can use max_uid_ccaches, max_ccaches and max_ccache_size. Please check man sssd-kcm(8).

And, of course, you can always run kdestroy -A to clean the whole cache.

joakim-tjernlund commented 5 months ago

How do I use kdestroy to delete another users cache ?

alexey-tikhonov commented 5 months ago

How do I use kdestroy to delete another users cache ?

'su $user; kdestroy -A'

qralston commented 3 months ago

While I appreciate the effort that went into PR #6917, unfortunately, PR #6917 does not fix this issue.

Our issue isn’t that we’re filling up KCM with credentials. Our issue is that we rely heavily on authenticated filesystem access (CIFS, NFS RPCGSS) where a kernel upcall mechanism needs to obtain user credentials, and these upcall mechanisms seem to assume kernel persistent keyring behavior, where 1) duplicate credentials are not permitted, and 2) the kernel automatically purges expired credentials. As such, these upcall mechanisms can misbehave if there are multiple expired user credentials in the user’s cache collection, plucking an expired credential instead of a non-expired one, causing Permission denied errors and all sorts of other breakage.

To put it simply as possible: having expired credentials present in a user’s cache collections badly breaks things. KCM needs a mechanism to purge expired credentials, regardless of how they were added to KCM, reasonably quickly after the credential expires.

In psuedocode, we need this:

for each user U cache collection in KCM; do
  for each credential C in user U cache collection; do
    if credential C is expired; then
      kdestroy C
    fi
  done
done

Basically, we want this option:

If the root user could easily enumerate the set of users who have any active credentials in KCM, then we could implement our own purge_expired_credentials_interval feature using a cron job / systemd timer that enumerated over the users with credentials, and used setpriv to invoke a (e.g.) purge-any-expired-credentials script for each user with credentials. (klist -l flags which credentials are expired in its output, so at that point, one can purge expired credentials simply by looking for expired credentials in the klist -l output and then executing kdestroy with KRB5CCNAME set to that specific credential.)

But, alas, the contents of KCM are completely opaque: even if one is running as root, there is no sssd tool (e.g. sssctl) that will enumerate the set of users who have credentials in KCM. (I briefly played around with attempting to parse the output of tdbdump /var/lib/sss/secrets/secrets.ldb, but lordy, that would graduate from a kluge to an ugly hack.)

So, we are stuck: SSSD neither implements a feature to purge expired credentials (which cause massive breakage in our environment), nor gives us the ability to kluge something together ourselves. We don’t want to abandon KCM and go back to using the kernel persistent keyring, but for the amount of breakage we are experiencing with KCM and expired credentials, we are reluctantly considering it.

I know it is difficult to infer tone in online communication, so I will specifically disclaim that this is a completely honest question (neither sarcasm nor snark): have I adequately explained what the issue here is? If not, what is unclear; what do I need to clarify?

Finally: please reopen this issue, because the issue is not fixed.

andreboscatto commented 3 months ago

Hi @qralston,

Thank you for your honesty and taking the time to respond with such a detailed explanation. It really helped us discuss and develop the following User Story, Description, and Acceptance Criteria. Could you please confirm if these address the needs you described?

Before that, we'd like to be transparent as well. We do intend to work on this, but our pipeline is currently full. We're focusing on new features related to Zero Trust Architecture, Passwordless authentication, OAuth2, and others.

@aplopez is about to start a significant work related to the performance of SSSD's caching mechanism, which has been a frequent source of user complaints over the years. Identifying the bottlenecks and potential solutions to address those, drafting the design page with the proposed changes, development, testing and other tasks to enhancing SSSD performance related to caching will take some good amount of time. Once that work is accomplished, we can tackle this KCM RFE. If you’re okay with waiting a few months, that’s great. If anyone reading this comment is willing to contribute, you are more than welcome, and we will assist however we can.

User Story

As an admin, I want to implement a mechanism to periodically purge expired credentials from the KCM, Then the system will automatically remove expired credentials to prevent permission errors and system breakage.

Description:

The system currently faces issues with expired credentials in the Key Collection Manager (KCM) causing permission errors and operational disruptions. These issues arise because the kernel upcall mechanisms, which handle authenticated filesystem access (such as CIFS, NFS RPCGSS), incorrectly handle expired credentials. To resolve this, we need to introduce a parameter _purge_expired_credentials_interval_ in the KCM configuration that allows the system to periodically purge expired credentials for all users. This feature will ensure that expired credentials are promptly removed, thus maintaining the integrity and functionality of the upcall mechanisms.

Acceptance Criteria

  1. Configuration Parameter Addition:

    • A new configuration parameter _purge_expired_credentials_interval_ is added to the [kcm] section of the KCM configuration file.
    • The parameter accepts values starting with 300 seconds (no less than that)
    • A value of 0 disables the purging mechanism.
  2. Default Behavior:

    • By default, _purge_expired_credentials_interval_ is set to 0, meaning no automatic purging of expired credentials occurs unless explicitly configured.
  3. Purging Mechanism Implementation:

    • The system periodically checks and purges expired credentials based on the interval specified by _purge_expired_credentials_interval_.
    • The purging process involves iterating over each user's credential collection and removing credentials that have expired.
    • The purge will happen regardless what user is.
  4. System Integrity and Logging:

    • Ensure that the purging process does not affect valid credentials or disrupt active sessions.
    • Detailed logging is implemented for purging activities, including timestamps and user identifiers for purged credentials, to aid in monitoring and troubleshooting (SSSD debug level 9)
  5. Man page:

    • This information should be available at the man page, describing its behavior and warning users about the potential harm when enabling both mechanisms (Remove the oldest expired credential if nor more space)
  6. Test:

    • Create and automate tests of this new feature

Kindly André Boscatto - SSSD Product Owner

yrro commented 3 months ago

I think that if credentials are only purged on a timer, there can still be a period of time (up to 300 seconds in the above design) where a user's KCM cache collection will contain a valid credential and an expired credential for the same principal.

If purging of expired credentials happens during the process of adding a new credential to the cache then this window is greatly shortened. We're already removing the oldest expired credential if there's no space: how about an option to, when a credential cache is added for a principal, remove all other credential caches for that principal? That way, as long as a new credential cache for a principal is added before the old one expires, there's no period of time where an expired credential cache confuses clients.

As for the clients themselves: it might be worth filing separate issues with the clients (nfs-utils/gssproxy/cifs-utils) to improve their behaviour in the presence of credential cache collections that may contain multiple credential caches for a given principal. If that were to happen then this improvement in SSSD wouldn't be so important.

[I've edited this comment to improve wording and flesh a few things out]

qralston commented 3 months ago

Hi @andreboscatto, yes; I think the (User Story, Description, and Acceptance Criteria) you described are accurate. Thank you!

@yrro: I think it would be fine if there were an option to enable KCM to automatically purge any expired credentials in a credential collection when certain types of interactions occur (or perhaps any type of interaction occurs) with that credential collection. However, I think that feature might be more difficult to implement than a simple background timer/cleanup action. Furthermore, that’s something that can easily be implemented outside of sssd. E.g., an /etc/profile.d/purge-expired-credentials.sh file as follows:

#! /bin/sh

if [ 0$(id -u 2>/dev/null) -gt 0 ]; then
  klist -l 2>/dev/null | awk '$3 == "(Expired)" {print $2}' | while read C; do
    env KRBCCNAME="${C}" kdestroy 2>/dev/null
  done
  unset C
fi

This won’t help us when a user logs in with expired credentials that derail upcall mechanisms, though, because the upcall mechanism fires when the shell touches the user’s home directory, which occurs before the /etc/profile.d scripts are sourced.

And yes, I agree that in an ideal world, the upcall mechanisms should not misbehave. But the reality is that changing the upcall mechanisms is likely going to be a tough sell, because the behavior of the kernel persistent keyring (no duplicate credentials; the kernel automatically purges credentials when they expire) is the de-facto standard behavior for credential collections, and with that behavior, no issues occur.