Open qralston opened 1 year ago
Pushed PR: https://github.com/SSSD/sssd/pull/6917
master
sssd-2-9
So, I'm curious about the actual fix implemented. The requester here seems to have asked for expired credentials to automatically be removed from the cache (which I would like to see as well). But what seems to have been implemented is that if the cache fills up the oldest expired credential will be removed to make room. This would still suggest that we will end up with large amounts of expired credentials in the cache. Is that right or am I missing something?
what seems to have been implemented is that if the cache fills up the oldest expired credential will be removed to make room. This would still suggest that we will end up with large amounts of expired credentials in the cache. Is that right or am I missing something?
That's right.
You are right. What was implemented is that the oldest expired credential will be removed if a new credential needs to be added to the cache. This is the best solution considering other users want the opposite behavior.
If you want to limit the number of credentials, you can use max_uid_ccaches
, max_ccaches
and max_ccache_size
. Please check man sssd-kcm(8)
.
And, of course, you can always run kdestroy -A
to clean the whole cache.
How do I use kdestroy to delete another users cache ?
How do I use kdestroy to delete another users cache ?
'su $user; kdestroy -A'
While I appreciate the effort that went into PR #6917, unfortunately, PR #6917 does not fix this issue.
Our issue isn’t that we’re filling up KCM with credentials. Our issue is that we rely heavily on authenticated filesystem access (CIFS, NFS RPCGSS) where a kernel upcall mechanism needs to obtain user credentials, and these upcall mechanisms seem to assume kernel persistent keyring behavior, where 1) duplicate credentials are not permitted, and 2) the kernel automatically purges expired credentials. As such, these upcall mechanisms can misbehave if there are multiple expired user credentials in the user’s cache collection, plucking an expired credential instead of a non-expired one, causing Permission denied errors and all sorts of other breakage.
To put it simply as possible: having expired credentials present in a user’s cache collections badly breaks things. KCM needs a mechanism to purge expired credentials, regardless of how they were added to KCM, reasonably quickly after the credential expires.
In psuedocode, we need this:
for each user U cache collection in KCM; do
for each credential C in user U cache collection; do
if credential C is expired; then
kdestroy C
fi
done
done
Basically, we want this option:
purge_expired_credentials_interval
This parameter goes in the [kcm]
section.
KCM periodically purges all expired credentials, for all users who have credential collections. This parameter specifies how many seconds KCM waits after completing a purge before performing the next purge.
The minimum value is 300 (5 minutes) and the maximum value is 86400 (24 hours).
As a special case, a value of 0 is also supported. If the value is 0, KCM does not purge expired credentials.
The default is 0; that is, KCM does not purge expired credentials by default; one must specifically set this parameter to a legal nonzero value to enable purging expired credentials.
If the root user could easily enumerate the set of users who have any active credentials in KCM, then we could implement our own purge_expired_credentials_interval
feature using a cron job / systemd timer that enumerated over the users with credentials, and used setpriv
to invoke a (e.g.) purge-any-expired-credentials
script for each user with credentials. (klist -l
flags which credentials are expired in its output, so at that point, one can purge expired credentials simply by looking for expired credentials in the klist -l
output and then executing kdestroy
with KRB5CCNAME
set to that specific credential.)
But, alas, the contents of KCM are completely opaque: even if one is running as root, there is no sssd tool (e.g. sssctl
) that will enumerate the set of users who have credentials in KCM. (I briefly played around with attempting to parse the output of tdbdump /var/lib/sss/secrets/secrets.ldb
, but lordy, that would graduate from a kluge to an ugly hack.)
So, we are stuck: SSSD neither implements a feature to purge expired credentials (which cause massive breakage in our environment), nor gives us the ability to kluge something together ourselves. We don’t want to abandon KCM and go back to using the kernel persistent keyring, but for the amount of breakage we are experiencing with KCM and expired credentials, we are reluctantly considering it.
I know it is difficult to infer tone in online communication, so I will specifically disclaim that this is a completely honest question (neither sarcasm nor snark): have I adequately explained what the issue here is? If not, what is unclear; what do I need to clarify?
Finally: please reopen this issue, because the issue is not fixed.
Hi @qralston,
Thank you for your honesty and taking the time to respond with such a detailed explanation. It really helped us discuss and develop the following User Story, Description, and Acceptance Criteria. Could you please confirm if these address the needs you described?
Before that, we'd like to be transparent as well. We do intend to work on this, but our pipeline is currently full. We're focusing on new features related to Zero Trust Architecture, Passwordless authentication, OAuth2, and others.
@aplopez is about to start a significant work related to the performance of SSSD's caching mechanism, which has been a frequent source of user complaints over the years. Identifying the bottlenecks and potential solutions to address those, drafting the design page with the proposed changes, development, testing and other tasks to enhancing SSSD performance related to caching will take some good amount of time. Once that work is accomplished, we can tackle this KCM RFE. If you’re okay with waiting a few months, that’s great. If anyone reading this comment is willing to contribute, you are more than welcome, and we will assist however we can.
User Story
As an admin, I want to implement a mechanism to periodically purge expired credentials from the KCM, Then the system will automatically remove expired credentials to prevent permission errors and system breakage.
Description:
The system currently faces issues with expired credentials in the Key Collection Manager (KCM) causing permission errors and operational disruptions. These issues arise because the kernel upcall mechanisms, which handle authenticated filesystem access (such as CIFS, NFS RPCGSS), incorrectly handle expired credentials. To resolve this, we need to introduce a parameter _
purge_expired_credentials_interval
_ in the KCM configuration that allows the system to periodically purge expired credentials for all users. This feature will ensure that expired credentials are promptly removed, thus maintaining the integrity and functionality of the upcall mechanisms.
Acceptance Criteria
Configuration Parameter Addition:
- A new configuration parameter _
purge_expired_credentials_interval
_ is added to the[kcm]
section of the KCM configuration file.- The parameter accepts values starting with 300 seconds (no less than that)
- A value of 0 disables the purging mechanism.
Default Behavior:
- By default, _
purge_expired_credentials_interval
_ is set to 0, meaning no automatic purging of expired credentials occurs unless explicitly configured.
Purging Mechanism Implementation:
- The system periodically checks and purges expired credentials based on the interval specified by _
purge_expired_credentials_interval
_.- The purging process involves iterating over each user's credential collection and removing credentials that have expired.
- The purge will happen regardless what user is.
System Integrity and Logging:
- Ensure that the purging process does not affect valid credentials or disrupt active sessions.
- Detailed logging is implemented for purging activities, including timestamps and user identifiers for purged credentials, to aid in monitoring and troubleshooting (SSSD debug level 9)
Man page:
- This information should be available at the man page, describing its behavior and warning users about the potential harm when enabling both mechanisms (Remove the oldest expired credential if nor more space)
Test:
- Create and automate tests of this new feature
Kindly André Boscatto - SSSD Product Owner
I think that if credentials are only purged on a timer, there can still be a period of time (up to 300 seconds in the above design) where a user's KCM cache collection will contain a valid credential and an expired credential for the same principal.
If purging of expired credentials happens during the process of adding a new credential to the cache then this window is greatly shortened. We're already removing the oldest expired credential if there's no space: how about an option to, when a credential cache is added for a principal, remove all other credential caches for that principal? That way, as long as a new credential cache for a principal is added before the old one expires, there's no period of time where an expired credential cache confuses clients.
As for the clients themselves: it might be worth filing separate issues with the clients (nfs-utils/gssproxy/cifs-utils) to improve their behaviour in the presence of credential cache collections that may contain multiple credential caches for a given principal. If that were to happen then this improvement in SSSD wouldn't be so important.
[I've edited this comment to improve wording and flesh a few things out]
Hi @andreboscatto, yes; I think the (User Story, Description, and Acceptance Criteria) you described are accurate. Thank you!
@yrro: I think it would be fine if there were an option to enable KCM to automatically purge any expired credentials in a credential collection when certain types of interactions occur (or perhaps any type of interaction occurs) with that credential collection. However, I think that feature might be more difficult to implement than a simple background timer/cleanup action. Furthermore, that’s something that can easily be implemented outside of sssd. E.g., an /etc/profile.d/purge-expired-credentials.sh
file as follows:
#! /bin/sh
if [ 0$(id -u 2>/dev/null) -gt 0 ]; then
klist -l 2>/dev/null | awk '$3 == "(Expired)" {print $2}' | while read C; do
env KRBCCNAME="${C}" kdestroy 2>/dev/null
done
unset C
fi
This won’t help us when a user logs in with expired credentials that derail upcall mechanisms, though, because the upcall mechanism fires when the shell touches the user’s home directory, which occurs before the /etc/profile.d
scripts are sourced.
And yes, I agree that in an ideal world, the upcall mechanisms should not misbehave. But the reality is that changing the upcall mechanisms is likely going to be a tough sell, because the behavior of the kernel persistent keyring (no duplicate credentials; the kernel automatically purges credentials when they expire) is the de-facto standard behavior for credential collections, and with that behavior, no issues occur.
Because KCM permits multiple credentials for the same user to be stored in the cache collection, credentials tend to accumulate. Over time, as cached credentials expire, the user’s cache collection becomes littered with duplicate credentials.
We have discovered that having duplicate expired credentials in the cache collection causes breakage. For example, ssh credential delegation can select an expired credential to delegate to the target host, even when a duplicate non-expired credential existed in the collection. In our environment, where home directories are mounted via NFSv4 with
sec=krb5p
, this locks the user out of their home directory, as they must either acquire or delegate a non-expired credential in order to access their home directory.Problems like this—where a failure on a remote host is in fact being caused by issues on the local host that initiated the remote connection—are exceedingly difficult for many users to grasp.
(Issue #6357, where KCM will randomly change the primary cache in the cache collection (now fixed, but it will take a while for that fix to propagate out to distros) makes this even worse.)
User complaints have gotten bad enough that we are trying to figure out a way to throw together some sort of “poor man’s expired credential purger.” But unfortunately, sssd makes this exceedingly difficult, because
sssctl
provides no ability to query any aspect of KCM.After trial and error, running this command as root:
…looks like it will show us the usernames of all users with credentials in KCM. From there, it should be possible to enumerate over those users via
runuser
and run a script to purge any expired credentials:But if sssd users have to resort to kluges like this—using third-party tools (
tdbdump
is a Samba utility) to dump KCM internals—in order to prevent KCM from causing breakage, it means that KCM lacks critical functionality.Specifically: KCM needs a mechanism to automatically purge expired credentials. E.g., something like this:
Note that the “regardless of the mechanism by which the credential was added to KCM” part is critical: our users frequently use
kinit
to stuff other credentials into their cache collections.There is a pressing need for this: it will eliminate problems caused by other processes and services unintentionally plucking expired credentials out of the user’s cache collection, and it will prevent the secrets database from growing without bounds because expired credentials are never purged.
Please add this feature.