Open joakim-tjernlund opened 1 month ago
Hi.
What is setup configuration?
And what step do you consider a bug?
Does it help if you call 2nd id
as SSS_NSS_USE_MEMCACHE=NO id...
?
I consider id cmd returning just system gid's when sssd is running and network is up. Also that sssd caches this false entry for long time.
./configure --prefix=/usr --build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu --mandir=/usr/share/man --infodir=/usr/share/info --datadir=/usr/share --sysconfdir=/etc --localstatedir=/var/lib --datarootdir=/usr/share --disable-dependency-tracking --disable-silent-rules --disable-static --docdir=/usr/share/doc/sssd-9999 --htmldir=/usr/share/doc/sssd-9999/html --with-sysroot=/ --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --runstatedir=/run --sbindir=/usr/sbin --with-pid-path=/run --with-plugin-path=/usr/lib64/sssd --enable-pammoddir=//lib64/security --with-ldb-lib-dir=/usr/lib64/samba/ldb --with-db-path=/var/lib/sss/db --with-gpo-cache-path=/var/lib/sss/gpo_cache --with-pubconf-path=/var/lib/sss/pubconf --with-pipe-path=/var/lib/sss/pipes --with-mcache-path=/var/lib/sss/mc --with-secrets-db-path=/var/lib/sss/secrets --with-log-path=/var/log/sssd --with-kcm --enable-kcm-renewal --with-os=gentoo --disable-rpath --disable-static --disable-valgrind --with-samba --enable-cifs-idmap-plugin --without-selinux --enable-krb5-locator-plugin --disable-pac-responder --with-nfsv4-idmapd-plugin --enable-nls --with-libnl --with-manpages --without-sudo --with-autofs --with-ssh --without-oidc-child --without-passkey --without-subid --disable-systemtap --without-python2-bindings --with-python3-bindings --with-initscript=systemd --with-systemdunitdir=/usr/lib/systemd/system
Sorry, I meant sssd.conf
Is this user - labuser
- AD user?
Sorry, I meant sssd.conf Is this user -
labuser
- AD user?
yes, it is an AD user
sssd.conf:
[sssd]
#config_file_version = 2
domains = infinera.com
#domains = infinera.com,transmode.se
services = nss, pam
#debug_level = 0x0fff
debug_level = 0x0000
[nss]
fallback_homedir = /home/%u
default_shell = /bin/bash
#debug_level = 0x0fff
enum_cache_timeout = 3600
entry_negative_timeout = 300
debug_level = 0x0000
[kcm]
tgt_renewal = true
# will inherit all KCM krb5_xxx values
#tgt_renewal_inherit = infinera.com
krb5_renewable_lifetime = 7d
krb5_lifetime = 10h
krb5_renew_interval = 2h
debug_level = 0x0000
[pam]
#Needs patch ?
pam_account_locked_message = "Account Locked"
#debug_level = 0x0fff
pam_response_filter = -ENV:KRB5CCNAME:sudo-i, -ENV:KRB5CCNAME:sudo
[domain/infinera.com]
dns_resolver_use_search_list = false
ad_enabled_domains = infinera.com
#debug_level = 0xffff
debug_level = 0x0000
timeout = 30
ad_maximum_machine_account_password_age = 0
#Do not think we need referals? Is a performance drain
ldap_referrals = false
ignore_group_members = false
ldap_id_mapping = false
cache_credentials = true
enumerate = false
ldap_enumeration_refresh_timeout = 1800
entry_cache_timeout = 3600
refresh_expired_interval = 2700
id_provider = ad
auth_provider = ad
access_provider = permit
chpass_provider = ad
ad_server = x.y.com
dyndns_auth = none
dyndns_auth_ptr = GSS-TSIG
dyndns_update = true
dyndns_refresh_interval = 60
dyndns_update_ptr = true
dyndns_ttl = 3600
case_sensitive = false
ldap_referrals = false
ldap_sasl_mech = GSSAPI
ldap_schema = rfc2307bis
ldap_access_order = expire
ldap_account_expire_policy = ad
ldap_force_upper_case_realm = true
krb5_realm = INFINERA.COM
krb5_canonicalize = true
krb5_store_password_if_offline = true
krb5_use_kdcinfo = False
krb5_renewable_lifetime = 7d
krb5_lifetime = 24h
krb5_renew_interval = 4h
Why do you use ldap_schema = rfc2307bis
with id_provider = ad
?
Does it help if you call 2nd
id
asSSS_NSS_USE_MEMCACHE=NO id...
?
Did you have a chance to check this? I guess "try first user again... still returns just system gid's and will do so for a while(minutes)" is due to mem-cache.
Wrt first lookup returning correct UID but empty GIDs - one needs to check logs. Those should be deduced from tokenGroups.
Why do you use
ldap_schema = rfc2307bis
withid_provider = ad
?
That is ancient, probably an leftover from LDAP days. Will try plain rfc2307
Does it help if you call 2nd
id
asSSS_NSS_USE_MEMCACHE=NO id...
?Did you have a chance to check this? I guess "try first user again... still returns just system gid's and will do so for a while(minutes)" is due to mem-cache.
I did systemctl edit sssd.service and added: [Service] Environment=SSS_NSS_USE_MEMCACHE=NO
This did not change anything, should I have done differently?
Wrt first lookup returning correct UID but empty GIDs - one needs to check logs. Those should be deduced from tokenGroups.
Why do you use
ldap_schema = rfc2307bis
withid_provider = ad
?That is ancient, probably an leftover from LDAP days. Will try plain rfc2307
Does it help if you call 2nd
id
asSSS_NSS_USE_MEMCACHE=NO id...
?Did you have a chance to check this? I guess "try first user again... still returns just system gid's and will do so for a while(minutes)" is due to mem-cache.
I did systemctl edit sssd.service and added: [Service] Environment=SSS_NSS_USE_MEMCACHE=NO
This did not change anything, should I have done differently?
Oh, I misread: Did instead: SSS_NSS_USE_MEMCACHE=NO id labuser
but that did not change anything either, in fact it got worse. Now it wont resolve gids for any user even if I wait a few mins Then I skipped the SSS_NSS_USE_MEMCACHE=NO part and id would fetch gids again for new users.
Why do you use
ldap_schema = rfc2307bis
withid_provider = ad
?That is ancient, probably an leftover from LDAP days. Will try plain rfc2307
Why not to leave it as a default -- ldap_schema = ad
?
Why do you use
ldap_schema = rfc2307bis
withid_provider = ad
?That is ancient, probably an leftover from LDAP days. Will try plain rfc2307
Why not to leave it as a default --
ldap_schema = ad
?
I can try that too, I don't recall why that was there. it was added many years ago.
Nothing I have done above have helped, sssd simply does NOT speak to AD untial a few minutes(2-3) has passed. Doing id
I hope you can reproduce this?
Would it be possible to get SSSD logs (sssdnss.log and sssd$domain.log) with debug_level = 9?
Would it be possible to get SSSD logs (sssdnss.log and sssd$domain.log) with debug_level = 9?
Would it be possible to get SSSD logs (sssdnss.log and sssd$domain.log) with debug_level = 9?
It looks more or less fine, 'tokenGroups' lookup seems to return a list of SIDs.
The problem is that at this moment domain, that those SIDs belong to, isn't yet(?) known (not discovered by SSSD):
$ grep "Domain not found for SID" sssd_infinera.com.log
... [sdap_ad_tokengroups_get_posix_members] (0x0080): [RID#28] Domain not found for SID S-1-5-21-1757981266-1085031214-682003330-46276
... [sdap_ad_tokengroups_get_posix_members] (0x0080): [RID#28] Domain not found for SID S-1-5-21-1757981266-1085031214-682003330-89875
... [sdap_ad_tokengroups_get_posix_members] (0x0080): [RID#28] Domain not found for SID S-1-5-21-1757981266-1085031214-682003330-92642
... [sdap_ad_tokengroups_get_posix_members] (0x0080): [RID#28] Domain not found for SID S-1-5-21-1757981266-1085031214-682003330-513
... [sdap_ad_tokengroups_get_posix_members] (0x0080): [RID#28] Domain not found for SID S-1-5-21-1757981266-1085031214-682003330-92633
... [sdap_ad_tokengroups_get_posix_members] (0x0080): [RID#28] Domain not found for SID S-1-5-21-1757981266-1085031214-682003330-56419
... [sdap_ad_tokengroups_get_posix_members] (0x0080): [RID#28] Domain not found for SID S-1-5-21-1757981266-1085031214-682003330-92645
... [sdap_ad_tokengroups_get_posix_members] (0x0080): [RID#28] Domain not found for SID S-1-5-21-1757981266-1085031214-682003330-92638
@sumit-bose, @justin-stephenson, this looks familiar but I can't recall details...
Hi,
I guess you are thinking of #7250.
Indeed. But it was fixed quite some time ago and, IIUC, @joakim-tjernlund is using build of latest 'master'?
Hi, I guess you are thinking of #7250.
Indeed. But it was fixed quite some time ago and, IIUC, @joakim-tjernlund is using build of latest 'master'?
Yes, I am on master. I vaguely remember the 7250 issue but no details I am afraid.
for fun I did:
diff --git a/src/providers/ad/ad_subdomains.c b/src/providers/ad/ad_subdomains.c
index d8f3738ce..fe8b823d6 100644
--- a/src/providers/ad/ad_subdomains.c
+++ b/src/providers/ad/ad_subdomains.c
@@ -1582,7 +1582,7 @@ static void ad_get_root_domain_done(struct tevent_req *subreq)
return;
}
- ret = ad_get_root_domain_refresh(state, false);
+ ret = ad_get_root_domain_refresh(state, true);
if (ret != EOK) {
DEBUG(SSSDBG_OP_FAILURE, "ad_get_root_domain_refresh() failed.\n");
}
but that didn't help.
Hi,
according to the logs DNS is not available when SSSD is starting, it this expected?
bye, Sumit
Hi,
according to the logs DNS is not available when SSSD is starting, it this expected?
bye, Sumit
yes, sssd starts before network is UP, network may never come UP if not connected at all. NW is started by NetworkManager which uses DHCP
Restating sssd after NW is UP does not help either.
I am getting more complaints/support requests now as people upgrade there computers. Any progress ?
Hi,
it looks like there is a race condition between getting up the network, reading the domain topology (including domain SIDs) and handling requests which depend on the domain topology. I'm looking for a way to avoid it.
bye, Sumit
Any luck? Eager to test something
Ping?
Hi,
thank you for your patience. Please have a look at https://github.com/SSSD/sssd/pull/7673, there are copr build for recent Fedora and RHEL releases at https://copr.fedorainfracloud.org/coprs/g/sssd/pr7673/.
The pull-request is currently in Draft state, because I'm not sure if it will be the final solution because I have to figure out if there are still some race conditions. So it would be nice if you can check as well if you still see failures after reboot. Additionally, the patch does a refresh of the sub-domain data at every switch from offline to online and not only ensures that it is done after restart when getting online.
bye, Sumit
A quick test on my test system works. Now id cmd just hangs a few sec and then I get full groups back
I guess the initial id request does quite some extra work?
I have added the patch to our Gentoo so it will get some more testing the coming week.
another unrelated observation:
sss_cache -E
id user1 - takes about 5 secs
id user2 - well below 1 second
The initial id cmd after sss_cache -E or rm *ldb files ; restart sssd always takes several secs(5 or so) to complete but any other id cmd after that is fast.
I have added the patch to our Gentoo so it will get some more testing the coming week.
another unrelated observation:
sss_cache -E id user1 - takes about 5 secs id user2 - well below 1 second
The initial id cmd after sss_cache -E or rm *ldb files ; restart sssd always takes several secs(5 or so) to complete but any other id cmd after that is fast.
Hi,
thanks for testing. The different times are expected since you are using ignore_group_members = false
. This means for the first id
call SSSD has to read all groups the user is a member of and all the members of those groups as well. If the second user is a member of similar groups than the first all groups for the second user are already in the cache.
bye, Sumit
I have added the patch to our Gentoo so it will get some more testing the coming week. another unrelated observation:
sss_cache -E id user1 - takes about 5 secs id user2 - well below 1 second
The initial id cmd after sss_cache -E or rm *ldb files ; restart sssd always takes several secs(5 or so) to complete but any other id cmd after that is fast.
Hi,
thanks for testing. The different times are expected since you are using
ignore_group_members = false
. This means for the firstid
call SSSD has to read all groups the user is a member of and all the members of those groups as well. If the second user is a member of similar groups than the first all groups for the second user are already in the cache.bye, Sumit
Not anymore ! :) Seriously, what use case needs that? samba file servers or something more exotic?
This extra work to read all members of a group, could that not be a background task? id cmd is not asking for that so it seems that work can be batched in background.
id cmd is not asking for that so it seems that work can be batched in background.
id
gets a list of GIDs user is a member of (using getgrouplist()
) and then needs to resolve every GID to group name.
This resolution is done using getgrgid()
that returns struct group
, including all members.
The fact that id
doesn't use group::gr_mem
data later doesn't matter.
Hi,
thank you for your patience. Please have a look at #7673, there are copr build for recent Fedora and RHEL releases at https://copr.fedorainfracloud.org/coprs/g/sssd/pr7673/.
The pull-request is currently in Draft state, because I'm not sure if it will be the final solution because I have to figure out if there are still some race conditions. So it would be nice if you can check as well if you still see failures after reboot. Additionally, the patch does a refresh of the sub-domain data at every switch from offline to online and not only ensures that it is done after restart when getting online.
bye, Sumit
a handful people or so has tested this now and still looks good, ship it! :)
I have added the patch to our Gentoo so it will get some more testing the coming week. another unrelated observation:
sss_cache -E id user1 - takes about 5 secs id user2 - well below 1 second
The initial id cmd after sss_cache -E or rm *ldb files ; restart sssd always takes several secs(5 or so) to complete but any other id cmd after that is fast.
Hi, thanks for testing. The different times are expected since you are using
ignore_group_members = false
. This means for the firstid
call SSSD has to read all groups the user is a member of and all the members of those groups as well. If the second user is a member of similar groups than the first all groups for the second user are already in the cache. bye, SumitNot anymore ! :) Seriously, what use case needs that? samba file servers or something more exotic?
So ignore_group_members = true
failed for www-apache/mod_authz_unixgroup
If you need it, there is an PR to make it work:
https://github.com/phokz/mod-auth-external/pull/54/commits/687b088c2b703243036cfbf8b3b5692dd7177bc5
cd /var/lib/sss/db/ rm -f *.ldb reboot
as soon as machine is up, ssh as root or login as local root and do:
> id labuser
uid=10019(labuser) gid=100(users) groups=100(users)
just system gid's returned.
Wait c.a 2 mins and try another user, then id will return AD gid's try first user again:
> id labuser
uid=10019(labuser) gid=100(users) groups=100(users)
still returns just system gid's and will do so for a while(minutes)
This is on current master but the issue has been present for months I think.