IBM / CAST

CAST can enhance the system management of cluster-wide resources. It consists of the open source tools: cluster system management (CSM) and burst buffer.
Eclipse Public License 1.0
27 stars 34 forks source link

Incorrect pam.d configuration on non-compute nodes causes "Cannot find a registered handler." messages to be logged in csm_master.log #983

Closed besawn closed 3 years ago

besawn commented 3 years ago

Symptom

When sshing to the launch node as a regular user, errors are logged in the csm_master.log on the master node with no observable impact to cluster operation:

2020-10-15 14:58:08.283291       csmd::error    | GetEventHandler(): Cannot find a registered handler. Return the ErrorEventHandler!

Note: this is a generic error message that will be logged any time an unexpected API request is received. In this particular case it was due to the situation described in this issue, but it could be logged due to other, unrelated root causes.

Root cause

libcsmpam.so is incorrectly included in the sshd pam configuration on a node running any CSM daemon type other than csmd-compute (csmd-master, csmd-utility, or csmd-aggregator):

[root@f5n06 ~]# ps -ef | grep [c]smd
root       69256       1  0 Oct13 ?        00:00:58 /opt/ibm/csm/sbin/csmd -f /etc/ibm/csm/csm_utility.cfg

[root@f5n06 ~]# grep libcsmpam.so /etc/pam.d/sshd
account    required     libcsmpam.so
session    required     libcsmpam.so

libcsmpam.so is only intended to be enabled on CSM compute nodes. When it is called on non-compute nodes by including it in the sshd pam configuration, all other types of CSM daemon correctly ignore any calls to the CSM_CMD_cgroup_login API to prevent unexpected pam and cgroup operations from being performed on non-compute nodes. However, the CSM utility and aggregator daemons also include a default forward handler to forward any locally received API calls to the CSM master for processing if there is not a local handler assigned. This is the full sequence of events that leads to the spurious error message being logged:

Solution

Comment out the libcsmpam.so entries from the sshd pam configuration on CSM non-compute nodes and the error messages will no longer be logged when users log in:

[root@f5n06 ~]# grep libcsmpam.so /etc/pam.d/sshd
#account    required     libcsmpam.so
#session    required     libcsmpam.so
besawn commented 3 years ago

Issue was resolved by correcting the sshd pam configuration as described above.