aiidateam / aiida-core

The official repository for the AiiDA code
https://aiida-core.readthedocs.io
Other
435 stars 188 forks source link

Simplify the interface of `verdi computer setup` #4981

Open sphuber opened 3 years ago

sphuber commented 3 years ago

The setup of a Computer through verdi computer setup can be a bit overwhelming as it asks to fill out all the attributes that it can store. The same goes for verdi computer configure. For most usecases, options like the Kerberos settings are not really needed. Would it be possible to simplify the interface such that the interactive command by default only shows a subset of required fields, such as the username, hostname etc. and omit the more advanced features. The latter should of course still be changeable some other way for users that need it. This might make the setting up of computers and codes less daunting perhaps. This followed from a discussion with @louisponet

mbercx commented 1 year ago

I agree that setting up the computers and codes is often a hurdle for beginning users and it would be great if it could be simplified. Two first points to discuss here are:

  1. If we have sensible defaults for some of the setup/configure inputs, when should these be set automatically? Now this is only done in case the --non-interactive option is specified, but beginning users are often unaware of this possibility. Should this only be documented better, or should we adapt how the interactive command works? I.e. if a user only runs verdi computer setup or verdi computer configure core.ssh, should we not request inputs for some settings? Should the behaviour be different when the user already provides some settings, either through CLI options or a YAML file and the --config option?
  2. For each of the commands, which inputs are "required", "optional, should be requested" (because the defaults are often incorrect) and "optional, should not be requested" (because the defaults are mostly correct). I've quickly sorted the inputs for the verdi computer setup and verdi computer configure core.ssh commands among these three options below based on my experience, but this can of course still be up for discussion.

verdi computer setup

Required

  -L, --label LABEL               Unique, human-readable label for this
                                  computer.  [required]
  -H, --hostname HOSTNAME         The fully qualified hostname of the computer
                                  (e.g. daint.cscs.ch). Use "localhost" when
                                  setting up the computer that AiiDA is
                                  running on.  [required]
  -T, --transport PLUGIN          A transport plugin (as listed in `verdi
                                  plugin list aiida.transports`).  [required]
  -S, --scheduler PLUGIN          A scheduler plugin (as listed in `verdi
                                  plugin list aiida.schedulers`).  [required]

Optional, should be requested

  -w, --work-dir TEXT             The absolute path of the directory on the
                                  computer where AiiDA will run the
                                  calculations (often a "scratch"
                                  directory).The {username} string will be
                                  replaced by your username on the remote
                                  computer.
  -m, --mpirun-command MPIRUNCOMMANDSTRING
                                  The mpirun command needed on the cluster to
                                  run parallel MPI programs. The
                                  {tot_num_mpiprocs} string will be replaced
                                  by the total number of cpus. See the
                                  scheduler docs for further scheduler-
                                  dependent template variables.
  --prepend-text TEXT             Bash commands that should be prepended to
                                  the executable call in all submit scripts
                                  for this computer.
  --append-text TEXT              Bash commands that should be appended to the
                                  executable call in all submit scripts for
                                  this computer.

Optional, should not be requested

  -D, --description DESCRIPTION   A human-readable description of this
                                  computer.
  --shebang SHEBANGLINE           Specify the first line of the submission
                                  script for this computer (only the bash
                                  shell is supported).
  --mpiprocs-per-machine INTEGER  The default number of MPI processes that
                                  should be executed per machine (node), if
                                  not otherwise specified.Use 0 to specify no
                                  default value.
  --default-memory-per-machine INTEGER
                                  The default amount of RAM (kB) that should
                                  be allocated per machine (node), if not
                                  otherwise specified.
  --use-double-quotes / --not-use-double-quotes
                                  Whether the command line arguments before
                                  and after the executable in the submission
                                  script should be escaped with single or
                                  double quotes.

verdi computer configure core.ssh

Required

  --username TEXT                 Login user name on the remote machine.

Optional, should be requested

  --key-filename FILE             Absolute path to your private SSH key. Leave
                                  empty to use the path set in the SSH config.

Optional, should not be requested

  -u, --user USER                 Email address of the user.
  -P, --port INTEGER              Port number.
  --look-for-keys / --no-look-for-keys
                                  Automatically look for private keys in the
                                  ~/.ssh folder.
  --timeout INTEGER               Time in seconds to wait for connection
                                  before giving up. Leave empty to use default
                                  value.
  --allow-agent / --no-allow-agent
                                  Switch to allow or disallow using an SSH
                                  agent.
  --proxy-jump TEXT               SSH proxy jump for tunneling through other
                                  SSH hosts. Use a comma-separated list of
                                  hosts of the form [user@]host[:port]. If
                                  user or port are not specified for a host,
                                  the user & port values from the target host
                                  are used. This option must be provided
                                  explicitly and is not parsed from the SSH
                                  config file when left empty.
  --proxy-command TEXT            SSH proxy command for tunneling through a
                                  proxy server. For tunneling through another
                                  SSH host, consider using the "SSH proxy
                                  jump" option instead! Leave empty to parse
                                  the proxy command from the SSH config file.
  --compress / --no-compress      Turn file transfer compression on or off.
  --load-system-host-keys / --no-load-system-host-keys
                                  Load system host keys from default SSH
                                  location.
  --gss-auth BOOLEAN              Enable when using GSS kerberos token to
                                  connect.
  --gss-kex BOOLEAN               GSS kex for kerberos, if not configured in
                                  SSH config file.
  --gss-deleg-creds BOOLEAN       GSS deleg_creds for kerberos, if not
                                  configured in SSH config file.
  --gss-host TEXT                 GSS host for kerberos, if not configured in
                                  SSH config file.
  --key-policy [RejectPolicy|WarningPolicy|AutoAddPolicy]
                                  SSH key policy if host is not known.
  --use-login-shell / --no-use-login-shell
                                  Not using a login shell can help suppress
                                  potential spurious text output that can
                                  prevent AiiDA from parsing the output of
                                  commands, but may result in some startup
                                  files (.profile) not being sourced.
  --safe-interval FLOAT           Minimum time interval in seconds between
                                  opening new connections.
agoscinski commented 5 months ago

I am not super sure if this the most suitable issue to discuss this, but it is related.

From my viewpoint looking at this from the verdi shell, the current separation between setup and configure seems a bit entangled to me. I can see the intention to separate the information about the machine and how to connect to it, but in the setup you already define the type of transport and scheduler. Also the naming configure clashes with a lot of other commands verdi computer setup --config and verdi config and makes naming in verdi computer export <setup/configure> not so nice. My suggestion would be therefore:

Remove all the connection related information from verdi computer setup (scheduler and transport) and put this information into a the command verdi connection setup that replaces verdi computer configure

sphuber commented 5 months ago

Thanks for your input @agoscinski . The historical and practical reason for the separation of verdi computer setup and verdi computer configure is that the options for verdi computer configure depend on the transport plugin that is selected in verdi computer setup. So at the very least from a practical perspective it is technically difficult (if not impossible) to provide a single command to do both, because the options will be dynamic based on the value of another option.

Note also that the options for verdi computer configure are not so much just "connection" options, but simply configuration options for the transport plugin. As it so happens, for the core.ssh transport plugin, the majority of these are related to authentication. This is probably why the AuthInfo class that is created by verdi computer configure was called this way, but it should probably have been called differently. If you look at the core.local plugin, the options have nothing to do with authentication but just are options that customize the transport.

Another historical reason for the AuthInfo is that AiiDA supports defining multiple users. The AuthInfo allows reusing the same Computer for different users, encoding the user-specific details inside of it. This feature may not actually be used that often as I don't think multiple users using the same AiiDA profile is that common.

Besides that, there have been discussions in the past to (ab)using the AuthInfo to make certain computer properties mutable. The Computer instance and its properties are immutable once stored. This has caused usability issues in the past where if users made a small mistake, or want to simply update a property (for example the work_dir scratch location) they are forced to create a new computer. The AuthInfo however is fully mutable. There have been discussions in the past therefore to move the computer properties that we think should be mutable to the AuthInfo table instead. This last paragraph is not extremely pertinent to the question of the naming of verdi computer setup/configure but I thought I'd mention it anyway for completeness.

With that all out of the way, do you think verdi connection setup still makes sense? And how would it work concretely? If it includes defining the transport plugin, then how would you implement the command such that it allows the user to specify the configuration options that the transport plugin requires?

agoscinski commented 5 months ago

Thanks @sphuber for the extensive answer. I also just talked with @mbercx about this and he also referred me to this discussion https://github.com/aiidateam/team-compass/issues/12 about making the computer configuration more flexible (I think this is the one you are referring to). I thought the idea behind the setup/configure was to split machine and connection information into two steps, but as you mentioned in your answer this is not really the case. In the discussion with @mbercx I understood that it is more a question of splitting it into immutable and mutable information. Therefore I would say my initial suggestion is not working here.

My initial motivation for the suggestion was more to solve the clash of the commands verdi computer setup --config <FILE> and verdi computer configure. So to solve this issue I would rather suggest to rename verdi computer setup --config <FILE> to verdi computer setup --from-file <FILE> (exact naming of the option can be discussed).

Since my misunderstanding of the meaning of configure and setup comes also from the help page that is at the moment

$ verdi computer --help
  configure  Configure the Authinfo details for a computer (and user).
[...]
  setup      Create a new computer.
[...]

I suggest to change this help page to

$ verdi computer --help
  configure  Configure the default settings of the computer including its transport and scheduler plugin. These details can be changed be later on and set during calculations.
[...]
  setup      Create a new computer and fixes its information including the transport and scheduler plugin. These details cannot be changed later on.
[...]
sphuber commented 5 months ago

I thought the idea behind the setup/configure was to split machine and connection information into two steps

Historically this was the main reason. But not because we thought it was a nicer UX to split it, but simply because it was necessary as the configuration options dependend on the selected transport plugin. Theoretically it may be possible to develop a command that will first prompt for the transport plugin and based on that selection, prompts for the

I understood that it is more a question of splitting it into immutable and mutable information.

This use case only has become part of the discussion much later in AiiDA's development. We have found that users often just want to update a computer but this is not allowed with the model we are currently using. There have been discussion to simply move some properties that we think should be mutable from the Computer class to the AuthInfo class (to the Transport really, if you think about it). So this would essentially be leveraging an existing concept that was not intended for that use case, just to allow certain parts of the computer to be mutable.

The question is whether we should actually do this an abuse the AuthInfo or actually redesign it. But the problem is always backwards compatibility. This is a fundamental part of AiiDA and so we cannot just start breaking this.

agoscinski commented 5 months ago

Thanks for clarifying the historic context! I have one remaining question why one has to specify the transport plugin in the configure command verdi computer configure <TRANSPORT_PLUGIN> <COMPUTER>, if it was already specified in the setup, so it could inferred from the computer.

sphuber commented 5 months ago

The reason is that the commands are generated dynamically. Basically you will not find these commands hard coded in the code but they are generated based on what entry points are installed in the aiida.transports group. The options are generated on the fly based on the configuration options that the transport plugin defines.

The same happens for verdi code create where a subcommand is automatically created for each code plugin. (The same goes for verdi profile setup for each aiida.storage plugin).

For the latter two, the commands are generated by the aiida.cmdline.groups.dynamic module. The transport is an exception and uses a different implementation. This is again historical as it was the first dynamic command like this. I have an open PR #6198 that attempts to migrate these to use the newer method, but I have put that on hold because it has some changes in the current behavior

agoscinski commented 5 months ago

Thanks for your input @agoscinski . The historical and practical reason for the separation of verdi computer setup and verdi computer configure is that the options for verdi computer configure depend on the transport plugin that is selected in verdi computer setup. So at the very least from a practical perspective it is technically difficult (if not impossible) to provide a single command to do both, because the options will be dynamic based on the value of another option.

This is actually a crucial point for me if this is possible.

Motivation

Let me motivate a bit: To make new instances of objects, set parameters, update parameters the verdi commands verdi [profile|user|code|computer|group] use different command names which is very confusing for new users (at least for me). I can see some reasons for the different naming, but there so strong inconsistencies that result in a bad user experience.

verdi profile setup # make new instance from positional database plugin and set paramaters

verdi user configure # make new instance, set parameters, and update parameters

verdi code create # make new instance and sets parameters taking code plugin as positional argument
verdi code setup # make new instance and set parameters, or update parameters of existing instance

verdi computer setup # make new instance and sets some parameters
verdi computer configure # set remaining parameters taking transport plugin as positional argument

verdi group create # make new instance

So any new design suggestion to improve this has to consider how much we technically can change the verdi computer [configure|setup] command.

Question

Looking at what is used in the LazyConfigureGroup https://github.com/aiidateam/aiida-core/blob/06f8f4cfb0731ff699d5c01ad85418b6db0f6778/src/aiida/transports/cli.py#L157-L167

I can see that retrieving the computer information might be nontrivial since we only have the computer name. But we could create the computer instance at this level and by that use this to continue with the configuration of the transport plugin. This would open the option to merge the commands to one. Do you think this is possible @sphuber?

sphuber commented 5 months ago

I agree that the naming of the commands is very inconsistent. If we didn't have to deal with backwards compatibility, I would change it all and make it fully consistent. Unfortunately, that is not really possible for now.

If we were to migrate the computer setup code to use the DynamicEntryPointCommandGroup we could turn it into a single command. It would work exactly like the verdi code create where you would have verdi computer create and the transport plugin would be the subcommand. So you would have verdi computer create core.local and verdi computer create core.ssh.