Open sphuber opened 3 years ago
I agree that setting up the computers and codes is often a hurdle for beginning users and it would be great if it could be simplified. Two first points to discuss here are:
--non-interactive
option is specified, but beginning users are often unaware of this possibility. Should this only be documented better, or should we adapt how the interactive command works? I.e. if a user only runs verdi computer setup
or verdi computer configure core.ssh
, should we not request inputs for some settings? Should the behaviour be different when the user already provides some settings, either through CLI options or a YAML file and the --config
option?verdi computer setup
and verdi computer configure core.ssh
commands among these three options below based on my experience, but this can of course still be up for discussion.verdi computer setup
-L, --label LABEL Unique, human-readable label for this
computer. [required]
-H, --hostname HOSTNAME The fully qualified hostname of the computer
(e.g. daint.cscs.ch). Use "localhost" when
setting up the computer that AiiDA is
running on. [required]
-T, --transport PLUGIN A transport plugin (as listed in `verdi
plugin list aiida.transports`). [required]
-S, --scheduler PLUGIN A scheduler plugin (as listed in `verdi
plugin list aiida.schedulers`). [required]
-w, --work-dir TEXT The absolute path of the directory on the
computer where AiiDA will run the
calculations (often a "scratch"
directory).The {username} string will be
replaced by your username on the remote
computer.
-m, --mpirun-command MPIRUNCOMMANDSTRING
The mpirun command needed on the cluster to
run parallel MPI programs. The
{tot_num_mpiprocs} string will be replaced
by the total number of cpus. See the
scheduler docs for further scheduler-
dependent template variables.
--prepend-text TEXT Bash commands that should be prepended to
the executable call in all submit scripts
for this computer.
--append-text TEXT Bash commands that should be appended to the
executable call in all submit scripts for
this computer.
-D, --description DESCRIPTION A human-readable description of this
computer.
--shebang SHEBANGLINE Specify the first line of the submission
script for this computer (only the bash
shell is supported).
--mpiprocs-per-machine INTEGER The default number of MPI processes that
should be executed per machine (node), if
not otherwise specified.Use 0 to specify no
default value.
--default-memory-per-machine INTEGER
The default amount of RAM (kB) that should
be allocated per machine (node), if not
otherwise specified.
--use-double-quotes / --not-use-double-quotes
Whether the command line arguments before
and after the executable in the submission
script should be escaped with single or
double quotes.
verdi computer configure core.ssh
--username TEXT Login user name on the remote machine.
--key-filename FILE Absolute path to your private SSH key. Leave
empty to use the path set in the SSH config.
-u, --user USER Email address of the user.
-P, --port INTEGER Port number.
--look-for-keys / --no-look-for-keys
Automatically look for private keys in the
~/.ssh folder.
--timeout INTEGER Time in seconds to wait for connection
before giving up. Leave empty to use default
value.
--allow-agent / --no-allow-agent
Switch to allow or disallow using an SSH
agent.
--proxy-jump TEXT SSH proxy jump for tunneling through other
SSH hosts. Use a comma-separated list of
hosts of the form [user@]host[:port]. If
user or port are not specified for a host,
the user & port values from the target host
are used. This option must be provided
explicitly and is not parsed from the SSH
config file when left empty.
--proxy-command TEXT SSH proxy command for tunneling through a
proxy server. For tunneling through another
SSH host, consider using the "SSH proxy
jump" option instead! Leave empty to parse
the proxy command from the SSH config file.
--compress / --no-compress Turn file transfer compression on or off.
--load-system-host-keys / --no-load-system-host-keys
Load system host keys from default SSH
location.
--gss-auth BOOLEAN Enable when using GSS kerberos token to
connect.
--gss-kex BOOLEAN GSS kex for kerberos, if not configured in
SSH config file.
--gss-deleg-creds BOOLEAN GSS deleg_creds for kerberos, if not
configured in SSH config file.
--gss-host TEXT GSS host for kerberos, if not configured in
SSH config file.
--key-policy [RejectPolicy|WarningPolicy|AutoAddPolicy]
SSH key policy if host is not known.
--use-login-shell / --no-use-login-shell
Not using a login shell can help suppress
potential spurious text output that can
prevent AiiDA from parsing the output of
commands, but may result in some startup
files (.profile) not being sourced.
--safe-interval FLOAT Minimum time interval in seconds between
opening new connections.
I am not super sure if this the most suitable issue to discuss this, but it is related.
From my viewpoint looking at this from the verdi shell, the current separation between setup
and configure
seems a bit entangled to me. I can see the intention to separate the information about the machine and how to connect to it, but in the setup you already define the type of transport and scheduler. Also the naming configure
clashes with a lot of other commands verdi computer setup --config
and verdi config
and makes naming in verdi computer export <setup/configure>
not so nice. My suggestion would be therefore:
Remove all the connection related information from verdi computer setup
(scheduler
and transport
) and put this information into a the command verdi connection setup
that replaces verdi computer configure
Thanks for your input @agoscinski . The historical and practical reason for the separation of verdi computer setup
and verdi computer configure
is that the options for verdi computer configure
depend on the transport plugin that is selected in verdi computer setup
. So at the very least from a practical perspective it is technically difficult (if not impossible) to provide a single command to do both, because the options will be dynamic based on the value of another option.
Note also that the options for verdi computer configure
are not so much just "connection" options, but simply configuration options for the transport plugin. As it so happens, for the core.ssh
transport plugin, the majority of these are related to authentication. This is probably why the AuthInfo
class that is created by verdi computer configure
was called this way, but it should probably have been called differently. If you look at the core.local
plugin, the options have nothing to do with authentication but just are options that customize the transport.
Another historical reason for the AuthInfo
is that AiiDA supports defining multiple users. The AuthInfo
allows reusing the same Computer
for different users, encoding the user-specific details inside of it. This feature may not actually be used that often as I don't think multiple users using the same AiiDA profile is that common.
Besides that, there have been discussions in the past to (ab)using the AuthInfo
to make certain computer properties mutable. The Computer
instance and its properties are immutable once stored. This has caused usability issues in the past where if users made a small mistake, or want to simply update a property (for example the work_dir
scratch location) they are forced to create a new computer. The AuthInfo
however is fully mutable. There have been discussions in the past therefore to move the computer properties that we think should be mutable to the AuthInfo
table instead. This last paragraph is not extremely pertinent to the question of the naming of verdi computer setup/configure
but I thought I'd mention it anyway for completeness.
With that all out of the way, do you think verdi connection setup
still makes sense? And how would it work concretely? If it includes defining the transport
plugin, then how would you implement the command such that it allows the user to specify the configuration options that the transport plugin requires?
Thanks @sphuber for the extensive answer. I also just talked with @mbercx about this and he also referred me to this discussion https://github.com/aiidateam/team-compass/issues/12 about making the computer configuration more flexible (I think this is the one you are referring to). I thought the idea behind the setup/configure was to split machine and connection information into two steps, but as you mentioned in your answer this is not really the case. In the discussion with @mbercx I understood that it is more a question of splitting it into immutable and mutable information. Therefore I would say my initial suggestion is not working here.
My initial motivation for the suggestion was more to solve the clash of the commands verdi computer setup --config <FILE>
and verdi computer configure
. So to solve this issue I would rather suggest to rename verdi computer setup --config <FILE>
to verdi computer setup --from-file <FILE>
(exact naming of the option can be discussed).
Since my misunderstanding of the meaning of configure and setup comes also from the help page that is at the moment
$ verdi computer --help
configure Configure the Authinfo details for a computer (and user).
[...]
setup Create a new computer.
[...]
I suggest to change this help page to
$ verdi computer --help
configure Configure the default settings of the computer including its transport and scheduler plugin. These details can be changed be later on and set during calculations.
[...]
setup Create a new computer and fixes its information including the transport and scheduler plugin. These details cannot be changed later on.
[...]
I thought the idea behind the setup/configure was to split machine and connection information into two steps
Historically this was the main reason. But not because we thought it was a nicer UX to split it, but simply because it was necessary as the configuration options dependend on the selected transport plugin. Theoretically it may be possible to develop a command that will first prompt for the transport
plugin and based on that selection, prompts for the
I understood that it is more a question of splitting it into immutable and mutable information.
This use case only has become part of the discussion much later in AiiDA's development. We have found that users often just want to update a computer but this is not allowed with the model we are currently using. There have been discussion to simply move some properties that we think should be mutable from the Computer
class to the AuthInfo
class (to the Transport
really, if you think about it). So this would essentially be leveraging an existing concept that was not intended for that use case, just to allow certain parts of the computer to be mutable.
The question is whether we should actually do this an abuse the AuthInfo
or actually redesign it. But the problem is always backwards compatibility. This is a fundamental part of AiiDA and so we cannot just start breaking this.
Thanks for clarifying the historic context! I have one remaining question why one has to specify the transport plugin in the configure command verdi computer configure <TRANSPORT_PLUGIN> <COMPUTER>
, if it was already specified in the setup, so it could inferred from the computer.
The reason is that the commands are generated dynamically. Basically you will not find these commands hard coded in the code but they are generated based on what entry points are installed in the aiida.transports
group. The options are generated on the fly based on the configuration options that the transport plugin defines.
The same happens for verdi code create
where a subcommand is automatically created for each code plugin. (The same goes for verdi profile setup
for each aiida.storage
plugin).
For the latter two, the commands are generated by the aiida.cmdline.groups.dynamic
module. The transport is an exception and uses a different implementation. This is again historical as it was the first dynamic command like this. I have an open PR #6198 that attempts to migrate these to use the newer method, but I have put that on hold because it has some changes in the current behavior
Thanks for your input @agoscinski . The historical and practical reason for the separation of verdi computer setup and verdi computer configure is that the options for verdi computer configure depend on the transport plugin that is selected in verdi computer setup. So at the very least from a practical perspective it is technically difficult (if not impossible) to provide a single command to do both, because the options will be dynamic based on the value of another option.
This is actually a crucial point for me if this is possible.
Let me motivate a bit: To make new instances of objects, set parameters, update parameters the verdi commands verdi [profile|user|code|computer|group]
use different command names which is very confusing for new users (at least for me). I can see some reasons for the different naming, but there so strong inconsistencies that result in a bad user experience.
verdi profile setup # make new instance from positional database plugin and set paramaters
verdi user configure # make new instance, set parameters, and update parameters
verdi code create # make new instance and sets parameters taking code plugin as positional argument
verdi code setup # make new instance and set parameters, or update parameters of existing instance
verdi computer setup # make new instance and sets some parameters
verdi computer configure # set remaining parameters taking transport plugin as positional argument
verdi group create # make new instance
So any new design suggestion to improve this has to consider how much we technically can change the verdi computer [configure|setup]
command.
Looking at what is used in the LazyConfigureGroup
https://github.com/aiidateam/aiida-core/blob/06f8f4cfb0731ff699d5c01ad85418b6db0f6778/src/aiida/transports/cli.py#L157-L167
I can see that retrieving the computer information might be nontrivial since we only have the computer name. But we could create the computer instance at this level and by that use this to continue with the configuration of the transport plugin. This would open the option to merge the commands to one. Do you think this is possible @sphuber?
I agree that the naming of the commands is very inconsistent. If we didn't have to deal with backwards compatibility, I would change it all and make it fully consistent. Unfortunately, that is not really possible for now.
If we were to migrate the computer setup code to use the DynamicEntryPointCommandGroup
we could turn it into a single command. It would work exactly like the verdi code create
where you would have verdi computer create
and the transport plugin would be the subcommand. So you would have verdi computer create core.local
and verdi computer create core.ssh
.
The setup of a
Computer
throughverdi computer setup
can be a bit overwhelming as it asks to fill out all the attributes that it can store. The same goes forverdi computer configure
. For most usecases, options like the Kerberos settings are not really needed. Would it be possible to simplify the interface such that the interactive command by default only shows a subset of required fields, such as the username, hostname etc. and omit the more advanced features. The latter should of course still be changeable some other way for users that need it. This might make the setting up of computers and codes less daunting perhaps. This followed from a discussion with @louisponet