Closed lurk-er closed 1 week ago
Hi,
it would be good if you can add SSSD backend logs with debug_level = 9
covering the going offline and failing going online to understand why it cannot get back online. Additionally the sssd.conf
and your PAM configuration might be useful.
bye, Sumit
sssd_example.com.log sssd_nss.log sssd_pac.log sssd_pam.log krb5_child.log ldap_child.log sssd.conf.txt
here are all log files. i also turned on nss logging to lvl 10 to capture those errors. I tried my best to follow the debug steps in the sssd documentation and the farthest i got was figuring out that sssd_be was offline
Here are the PAM files that have been modified by hand. I have also included any files referenced by an @include\ just incase login.txt failsafe.txt common-password.txt common-session.txt common-account.txt common-auth.txt
Hi,
thank you for the logs. According to them SSSD can reach a DNS server to resolve the SRV record _ldap._tcp.example.com
and the returned DNS name of the AD DC dc1.example.com
to the IP address 172.27.0.3
. New SSSD tries to connect to port 389 of this IP address and fails with No route to host.
According to this message I would guess that either the AD DC is in a network segment which is not accessible to the client or the returned IP is just wrong. What does typically happen when the AD DC does become unreachable and SSSD switches from online into offline mode?
bye, Sumit
Thats interesting.....thats must be a resolvd address then and resolvd is pointed at the domain controllers address by setting the domain controller as the primary dns. For arguments sake, the domain controller is 192.168.1.40, the linux machine's is 192.168.1.46. neither machine is accessible to a 172.27.x.x network.
In setup, the system is perfectly able and capable of connecting to the domain controller to query, this also persists on reboot. and through time. it only breaks after switching over to the local account user and then back to domain users. note that i can still ping the domain controller by DNS name when it is in this broken state.
if the domain controller is not recognized then authentication fails.
Why does the backend stay in the offline state if the server can still be pinged?
Hi,
which version of SSSD are you using on which platform?
bye, Sumit
sssd 2.2.3 on ubuntu 20.04.6 LTS. all packages up to date
I have just noticed that i may have been using the wrong version of ubuntu for my testing. i will retry with ubuntu 22.04.2 and see if that changes anything
This issue is fixed in atleast sssd 2.6.3. I can force the same issue by using sss_cache -E and sssctl cache-remove with the difference being that restarting sssd solves allows users to be queried once again. apologies for wasting your time
After further testing, this issue can still exist in sssd 2.6.3 however a restart can still fix it sometimes. it is unreliable at best
Here is a clip of the backtrace log in sssd.log. why does it say its missing the pid for sssd? shouldnt that always be created on startup? sssd_backtrace.log
Hi,
have you tried if flushing all systemd-resolved caches helps?
bye, Sumit
I tried flushing dns. I think the issue which i discovered was a duplicate ip address. i made unique addresses and the problem went away on ubuntu 22.04
@lurk-er is the problem solved or is there anything else to do here? Just wondering if we can close the issue.
Yes this issue is resolved. Thank you all for your help!
A little back story I used realmd to join a domain, i updated my pam modules to use sss authentication.
When i first attempt to id a user after completing this setup, it works perfectly as expected. i can id users and retrieve their groups and permissions. Setting cache to not be retained has the system update the user groups in real time. The login stack requires the user to be an ad user and requires the user to respond to a radius authentication request handled by another service as 2fa. Also as part of this, by default, the local user is always denied login.
Due to the potential for AD to fail and kerberos to have problems, i set up some basic if then else login with pam that triggers the local user to become accessible along with an offline 2fa method. when this failsafe is triggered, authentication to the ad server is postponed and bypassed. When the condition is changed such that the local user is nolonger needed, the system switches back to requiring ad users.
The Issue The problem here is during this switch. initialy sssd ad works fine. when the switch happens, the authentication changes as expected. when it switches back to using ad, sssd never recovers and the sssd backend is listed as going offline. rebooting/reloading/cache clearing does not solve the issue.
Is there something else that needs to be done to allow sssd to continue to function when changing from ad authentication to local user authentication then back to ad authentication?
To be clear, PAM works repeatably such that forcing the second condition to happen results in the local user becoming active and they can successfully authenticate even after sssd breaks.