SGCI / sgci-resource-inventory

This contains all the computational resource entities
https://sgci-resource-inventory.readthedocs.io/en/latest/introduction.html
Apache License 2.0
6 stars 2 forks source link

Missing bin path for batchSystemDefinition #17

Open ericfranz opened 3 years ago

ericfranz commented 3 years ago

batchSystemDefinition has commandPaths which his an array of commandPathDefinition which has a name and and a path:

  1. The name appears to serve as a description but is not an ENUM so can't be reliably programmed against and the examples omit things like HOLD and RELEASE (observation - not necessarily a feature request :-) )
  2. The path appears to be intended to be the absolute path to the command.

So for Slurm one might do:

[
  { "name": "SUBMISSION", "path": "/usr/bin/sbatch" },
  { "name": "JOB_MONITORING", "path": "/usr/bin/squeue" },
  { "name": "DELETION", "path": "/usr/bin/scancel" },
]

It doesn't seem appropriate here but what I would like to be able to do is just specify "/usr/bin" somewhere. For example:

"batchSystem": {
  "jobManager": "SLURM",
  "commandBinPath": "/usr/bin"
}
ericfranz commented 3 years ago

So what about adding something like: commandRootPath or commantParentPath

"batchSystem": {
  "jobManager": "SLURM",
  "commandRootPath": "/usr/bin"

I don't recall from last meeting but @smarru were you going to propose an another solution for this?

ericfranz commented 3 years ago

@smarru just pinging if you had put any new thoughts into this. If a solution won't make it into 1.0.0 is there a workaround that you recommend?

joestubbs commented 2 years ago

I think we should revisit this issue for 1.1

smarru commented 2 years ago

Sorry I missed this ping before. I vote for something like "batchSystemBinPath" since most of the time all commands are usually put in the same location. I think it will be redundant and unnecessary to specify the path on all commands.

jpnavarro commented 2 years ago

The tradeoff is between several commands having an identical prefix/path versus complexity of having to check that a given command path isn't already absolute and that a batchSystemBinPath value was provided so that a gateway can construct a fully qualified command path using the two. There is zero impact on gateways for several command to have a "redundant" prefix. Is makings several command variables shorter and not sharing a shared substring really worth the complexity?

My suggestion is to keep it simple. The command path should stand on its own.

ericfranz commented 2 years ago

The tradeoff is between several commands having an identical prefix/path versus complexity of having to check that a given command path isn't already absolute and that a batchSystemBinPath value was provided so that a gateway can construct a fully qualified command path using the two

I wasn't suggesting that individual commandPaths would have relative paths with the shared prefix defined in commandBinPath or batchSystemBinPath. What I was suggesting was that commandBinPath or batchSystemBinPath could be specified and then the commandPaths for individual commands could be completely omitted altogether. If all of the commands are in the same bin path, why do I need to enumerate every command and its path explicitly?

The logic would be:

  1. if commandBinPath and commandPaths are not provided, the gateway uses the if only commandBinPath were provided, a gateway uses the default location the commands associated with the given jobManager
  2. if commandBinPath is provided, the gateway uses this as a prefix for the commands
  3. if commandBinPath is provided and then some commands with absolute paths are also provided in commandPaths, commandPaths is used for those specific commands

So commandPaths has precedence if provided.

jpnavarro commented 2 years ago

Are the commands enumerated so that a gateway can run whatever "SUBMISSION" references without having to know that "sbatch" is the desired command on a target system? If so, the variables are needed just for that.

If the variables aren't needed because the desired design is that a gateway hardcodes the commands based on jobManager type, then the suggestion for having BinPaths makes a lot of sense.

ericfranz commented 2 years ago

I think most preexisting gateways hardcode the commands based on jobManager type if they support job multiple job managers, so supporting this would help adoption.

ericfranz commented 2 years ago

I think the expectation right now would seem that the gateway will need to know how to interact with the jobManager type. It isn't like by abstracting the job manager command paths behind generic "variables" such as SUBMISSION and JOB_MONITORING and DELETION that a gateway could suddenly become ignorant of the job manager. The gateway will still need to know whether it is using qsub or bsub or sbatch and understand how to use each because of each accept different arguments and have different output.

In OnDemand there are cases where the default bin directory doesn't contain a specific binary so it needs to be customizable so with https://osc.github.io/ood-documentation/latest/installation/cluster-config-schema.html#bin-overrides this can be done:

# An example in Slurm
job:
  adapter: "slurm"
  bin: "/opt/slurm/bin"
  conf: "/opt/slurm/etc/slurm.conf"
  bin_overrides:
      # Override just want you want/need to
      sbatch: "/usr/local/slurm/bin/sbatch_wrapper"

So OnDemand looks for Slurm binaries under /opt/slurm/bin except for sbatch where it uses /usr/local/slurm/bin/sbatch_wrapper.

That is my context for this discussion. I imagine that commandBinPath or batchSystemBinPath and then the individual commandPaths might offer similar flexibility.

But the latter (commandPaths) needs more thought I think if the enums are not fixed (i.e. SUBMISSION). Also, if the command path looked like "name": "SUBMISSION", "path": "/usr/bin/sbatch" from the gateway knowing how to use Slurm, it might still have to have a hardcoded mapping in the gateway code between "SUBMISSION" and "sbatch" when the jobManager is slurm so that it knows that this variable "SUBMISSION" means the value will be the path to the sbatch binary - because the gateway is still responsible right now for knowing how to work with sbatch (what arguments to provide it and how to parse the output).

So it isn't yet clear to me the value of commandPaths having generic variable names like "SUBMISSION" and "DELETION".