OSC / ood_core

Open OnDemand core library
https://osc.github.io/ood_core/
MIT License
10 stars 29 forks source link

host_permitted? error difficult to debug #190

Open ericfranz opened 4 years ago

ericfranz commented 4 years ago

We have a host_permitted? method that raises an exception if the metadata of a Linux Host adapter job contains the hostname of a host that is not in the list of allowed hosts.

This can occur in the common case where the hostname command on a target host does not return the FQDN but the FQDN is used in the cluster config for the ssh hosts to check the status of. The result is something like this: metadata for a job says "owens" is the host to check the status of, but the only host in ssh hosts for the cluster using the linux host adapter is "owens.hpc.osc.edu" so the error raised is "Requested destination host (owens) not permitted". This is written to the log files, and in the user interface it just shows the job as being in a bad state.

There are two things that need to change.

  1. A better error message is needed than "Requested destination host (owens) not permitted". Something that explains the problem and suggests the solution. "The specified host 'owens-login01' is not in the list of ssh hosts configured for this cluster. The ssh hosts configured are 'owens-login01.hpc.osc.edu', 'owens-login02.hpc.osc.edu', 'owens-login03.hpc.osc.edu'. The specified host for this job is determined by running hostname on the target host. The output of hostname must match one of the specified ssh hosts."
  2. This error message should appear in the batch connect panel when it is in a bad state, instead of being hidden in the log file.

┆Issue is synchronized with this Asana task by Unito

ericfranz commented 4 years ago

Note: https://github.com/OSC/ood_core/issues/191 will fix the issue with hostname in a more appropriate way. However, the fixes above are relevant for any time that getting the status of a job causes a problem.

matthu017 commented 4 years ago

Is this solved by PR: https://github.com/OSC/ood_core/pull/201? Or does this require a separate solution?