SSSD / sssd

A daemon to manage identity, authentication and authorization for centrally-managed systems.
https://sssd.io
GNU General Public License v3.0
577 stars 235 forks source link

SSSD Throws Backend Error on Authentication Change in PAM #7418

Closed lurk-er closed 1 week ago

lurk-er commented 1 month ago

A little back story I used realmd to join a domain, i updated my pam modules to use sss authentication.

When i first attempt to id a user after completing this setup, it works perfectly as expected. i can id users and retrieve their groups and permissions. Setting cache to not be retained has the system update the user groups in real time. The login stack requires the user to be an ad user and requires the user to respond to a radius authentication request handled by another service as 2fa. Also as part of this, by default, the local user is always denied login.

Due to the potential for AD to fail and kerberos to have problems, i set up some basic if then else login with pam that triggers the local user to become accessible along with an offline 2fa method. when this failsafe is triggered, authentication to the ad server is postponed and bypassed. When the condition is changed such that the local user is nolonger needed, the system switches back to requiring ad users.

The Issue The problem here is during this switch. initialy sssd ad works fine. when the switch happens, the authentication changes as expected. when it switches back to using ad, sssd never recovers and the sssd backend is listed as going offline. rebooting/reloading/cache clearing does not solve the issue.

Is there something else that needs to be done to allow sssd to continue to function when changing from ad authentication to local user authentication then back to ad authentication?

To be clear, PAM works repeatably such that forcing the second condition to happen results in the local user becoming active and they can successfully authenticate even after sssd breaks.

sumit-bose commented 1 month ago

Hi,

it would be good if you can add SSSD backend logs with debug_level = 9 covering the going offline and failing going online to understand why it cannot get back online. Additionally the sssd.conf and your PAM configuration might be useful.

bye, Sumit

lurk-er commented 1 month ago

sssd_example.com.log sssd_nss.log sssd_pac.log sssd_pam.log krb5_child.log ldap_child.log sssd.conf.txt

here are all log files. i also turned on nss logging to lvl 10 to capture those errors. I tried my best to follow the debug steps in the sssd documentation and the farthest i got was figuring out that sssd_be was offline

lurk-er commented 1 month ago

Here are the PAM files that have been modified by hand. I have also included any files referenced by an @include\ just incase login.txt failsafe.txt common-password.txt common-session.txt common-account.txt common-auth.txt

sumit-bose commented 1 month ago

Hi,

thank you for the logs. According to them SSSD can reach a DNS server to resolve the SRV record _ldap._tcp.example.com and the returned DNS name of the AD DC dc1.example.com to the IP address 172.27.0.3. New SSSD tries to connect to port 389 of this IP address and fails with No route to host.

According to this message I would guess that either the AD DC is in a network segment which is not accessible to the client or the returned IP is just wrong. What does typically happen when the AD DC does become unreachable and SSSD switches from online into offline mode?

bye, Sumit

lurk-er commented 1 month ago

Thats interesting.....thats must be a resolvd address then and resolvd is pointed at the domain controllers address by setting the domain controller as the primary dns. For arguments sake, the domain controller is 192.168.1.40, the linux machine's is 192.168.1.46. neither machine is accessible to a 172.27.x.x network.

In setup, the system is perfectly able and capable of connecting to the domain controller to query, this also persists on reboot. and through time. it only breaks after switching over to the local account user and then back to domain users. note that i can still ping the domain controller by DNS name when it is in this broken state.

if the domain controller is not recognized then authentication fails.

Why does the backend stay in the offline state if the server can still be pinged?

sumit-bose commented 1 month ago

Hi,

which version of SSSD are you using on which platform?

bye, Sumit

lurk-er commented 1 month ago

sssd 2.2.3 on ubuntu 20.04.6 LTS. all packages up to date

lurk-er commented 1 month ago

I have just noticed that i may have been using the wrong version of ubuntu for my testing. i will retry with ubuntu 22.04.2 and see if that changes anything

lurk-er commented 1 month ago

This issue is fixed in atleast sssd 2.6.3. I can force the same issue by using sss_cache -E and sssctl cache-remove with the difference being that restarting sssd solves allows users to be queried once again. apologies for wasting your time

lurk-er commented 1 month ago

After further testing, this issue can still exist in sssd 2.6.3 however a restart can still fix it sometimes. it is unreliable at best

lurk-er commented 1 month ago

Here is a clip of the backtrace log in sssd.log. why does it say its missing the pid for sssd? shouldnt that always be created on startup? sssd_backtrace.log

sumit-bose commented 1 month ago

Hi,

have you tried if flushing all systemd-resolved caches helps?

bye, Sumit

lurk-er commented 1 month ago

I tried flushing dns. I think the issue which i discovered was a duplicate ip address. i made unique addresses and the problem went away on ubuntu 22.04

andreboscatto commented 2 weeks ago

@lurk-er is the problem solved or is there anything else to do here? Just wondering if we can close the issue.

lurk-er commented 1 week ago

Yes this issue is resolved. Thank you all for your help!