CAST can enhance the system management of cluster-wide resources. It consists of the open source tools: cluster system management (CSM) and burst buffer.
Eclipse Public License 1.0
27
stars
34
forks
source link
Incorrect pam.d configuration on non-compute nodes causes "Cannot find a registered handler." messages to be logged in csm_master.log #983
When sshing to the launch node as a regular user, errors are logged in the csm_master.log on the master node with no observable impact to cluster operation:
2020-10-15 14:58:08.283291 csmd::error | GetEventHandler(): Cannot find a registered handler. Return the ErrorEventHandler!
Note: this is a generic error message that will be logged any time an unexpected API request is received. In this particular case it was due to the situation described in this issue, but it could be logged due to other, unrelated root causes.
Root cause
libcsmpam.so is incorrectly included in the sshd pam configuration on a node running any CSM daemon type other than csmd-compute (csmd-master, csmd-utility, or csmd-aggregator):
libcsmpam.so is only intended to be enabled on CSM compute nodes. When it is called on non-compute nodes by including it in the sshd pam configuration, all other types of CSM daemon correctly ignore any calls to the CSM_CMD_cgroup_login API to prevent unexpected pam and cgroup operations from being performed on non-compute nodes. However, the CSM utility and aggregator daemons also include a default forward handler to forward any locally received API calls to the CSM master for processing if there is not a local handler assigned. This is the full sequence of events that leads to the spurious error message being logged:
the user sshes to the utility node
this ssh access invokes libcsmpam.so as part of the sshd pam stack
libcsmpam.so sends an CSM_CMD_cgroup_login request to the local CSM utility daemon
the local CSM utility daemon does not have a handler to allow cgroup_login to proceed (as designed)
the local CSM utility daemon forwards the API calls to the CSM master daemon
the CSM master daemon does not have a handler to allow cgroup_login to proceed (as designed)
the CSM master daemon creates an error log entry to indicate that an unexpected API request was received
Solution
Comment out the libcsmpam.so entries from the sshd pam configuration on CSM non-compute nodes and the error messages will no longer be logged when users log in:
Symptom
When sshing to the launch node as a regular user, errors are logged in the csm_master.log on the master node with no observable impact to cluster operation:
Note: this is a generic error message that will be logged any time an unexpected API request is received. In this particular case it was due to the situation described in this issue, but it could be logged due to other, unrelated root causes.
Root cause
libcsmpam.so is incorrectly included in the sshd pam configuration on a node running any CSM daemon type other than csmd-compute (csmd-master, csmd-utility, or csmd-aggregator):
libcsmpam.so is only intended to be enabled on CSM compute nodes. When it is called on non-compute nodes by including it in the sshd pam configuration, all other types of CSM daemon correctly ignore any calls to the
CSM_CMD_cgroup_login
API to prevent unexpected pam and cgroup operations from being performed on non-compute nodes. However, the CSM utility and aggregator daemons also include a default forward handler to forward any locally received API calls to the CSM master for processing if there is not a local handler assigned. This is the full sequence of events that leads to the spurious error message being logged:CSM_CMD_cgroup_login
request to the local CSM utility daemonSolution
Comment out the libcsmpam.so entries from the sshd pam configuration on CSM non-compute nodes and the error messages will no longer be logged when users log in: