aiidateam / aiida-core

The official repository for the AiiDA code
https://aiida-core.readthedocs.io
Other
431 stars 187 forks source link

Improve parsing of SSH config file for `verdi computer configure ssh` #4100

Open ConradJohnston opened 4 years ago

ConradJohnston commented 4 years ago

Related in spirit to issue #3311, and to the "UI Improvements" project.

Currently, when using verdi computer configure ssh to configure an SSH connection, the SSH config file, if present, is parsed by Paramiko and verdi offers some suggestions for populating the configuration options.

A typical stanza as currently supported might look like this:

Host best-hpc.phy.uni.ch
    User Conrad
    IdentityFile ~/.ssh/best-hpc.key

One limitation of the current implementation is that one must specify the fully qualified hostname as the Host and anything specified using the Hostname keyword is only used to substitute %h in SSH proxy commands, as I understand it. This is frustrating, as one typically uses the SSH Config to do something like:

Host BestHPC
    Hostname best-hpc.phy.uni.ch
    User Conrad
    IdentityFile ~/.ssh/best-hpc.key

where Host is used to define convenience aliases.

If this more canonical approach was supported it would reduce frustration, but also allow semi-convenient work arounds for locked down supercomputers (see #3929) like:

Host BestHPCwith2FA
    Hostname localhost
    User Conrad
    Port 22000

where one opens an SSH connection separately, supplies some 2FA, and redirects this connection to some local port, say 22000, and then points AiiDA to this localhost port, using a nice alias.

chrisjsewell commented 4 years ago

Sounds reasonable I think. On a semi-related note, I was recently messing around with a bit recently with automating this configuration when building VMs in https://github.com/chrisjsewell/ansible-role-aiida-computer

giovannipizzi commented 4 years ago

Indeed, I also had the same experience as you had @ConradJohnston Any suggestion on how to allow the parser to do the right job? Should we check the hostname? (Doesn't seem enough in the last example) - and if we do it, what should we do if there are multiple matches?

Or should we ask the user to pick from one of the "valid" entries, or maybe just ask to provide a new name, then print out the raw content of the .ssh/config file, and if they are happy, continue with those defaults? (Note that the Host could even just be Best* so we should also deal with these wildcard cases)

Or any other suggestions?

ltalirz commented 3 years ago

Personally, I've always found the fact that AiiDA tries to parse the .ssh/config file more confusing than helpful - as a user, it was never quite clear to me which information it takes from there and whether the config still has some effect after the computer was configured (and I just lost more than an hour because I did not know/remember that you cannot specify %h/%p in verdi computer ssh config - although it is documented).

Since the documentation anyhow starts by setting up your ~/.ssh/config, what speaks against storing in the AiiDA computer only the Host to use and let paramiko or some other SSH library take care of parsing the config?

Major benefits I would see:

Potential downsides

P.S. After looking a bit into it, I realized that while paramiko provides parsers for the SSH config, it does not seem to help you with e.g. automatically parsing the information for the proxy server as well, i.e. it looks to me like some legwork would remain on the AiiDA side when using paramiko (?).

However, there are tools like fabric (built on top of paramiko) that do just that.

Here's an excerpt from my ssh config:

Host ela
    User tal
    HostName ela.cscs.ch
    IdentityFile ~/.ssh/cscs
Host daint
    User tal
    Hostname daint.cscs.ch
    IdentityFile ~/.ssh/cscs
    ForwardAgent yes
    Port 22
    ProxyJump ela
    ForwardX11 yes
    ForwardX11Trusted yes

And here's what I can do after pip install --upgrade fabric paramiko (note: I needed paramiko 2.7.2 for this to work):

ipython
In [1]: from fabric import Connection

In [2]: c = Connection('ela')

In [3]: c.run('hostname')
ela5.cscs.ch
Out[3]: <Result cmd='hostname' exited=0>

In [4]: c = Connection('daint')

In [5]: c.run('hostname')
daint102
Out[5]: <Result cmd='hostname' exited=0>

In [6]: c.ssh_config
Out[6]:
{'user': 'tal',
 'hostname': 'daint.cscs.ch',
 'identityfile': ['/Users/leopold/.ssh/cscs'],
 'forwardagent': 'yes',
 'port': '22',
 'proxyjump': 'ela',
 'forwardx11': 'yes',
 'forwardx11trusted': 'yes',
 'sendenv': 'LANG LC_*'}
giovannipizzi commented 3 years ago

Just one comment to clarify some points where I think there might be some misunderstanding:

For these reasons I believe we cannot yet "trust" automatic parsing of the ssh_config. If we get to a point where we do trust it, then I would still vote to keep this automatic parsing and copy the settings as we do now in the DB (AuthInfo table), so we know which settings we are using for each computer. Example case: you might want to connect as two different users to the same computer.

One important thing to remember: the Host line in the ssh_config can also contain wildcards (e.g. Host daint*), so one has to be careful with parsing (and in particular the string to search for might not be the actual hostname, but just an alias - that is also the problem discussed in this issue) - I tested and fabric seems to be doing the right thing. I don't know if it's worth adding yet one more dependency (to fabric), possibly the basic parsing works also from paramiko? Even if I think there is some fabric-specific code, the config is loaded in Config().base_ssh_config._config.

In any case, again - if we have a way to correctly parse it, maybe we should just implement it rather than dropping copying the information in the AiiDA DB.

Maybe the simplest is, in verdi computer configure ssh (when run interactively), to search for the hostname, show all the settings that would be used, and ask the user if those are ok or if he/she wants to provide another host. In your case, the default wouldn't show the correct results, so you would type daint, this will now show that it correctly parsed the various keys (forwardagent, user, hostname, ...), you would confirm, and then it would just continue as now (using those as the defaults that you can still change interactively. Would this work?

ltalirz commented 3 years ago

Example case: you might want to connect as two different users to the same computer.

I haven't come across this use case before, have you? Of course, this only applies to the case where you also want the same prepend text for both users; as soon as there are differences one anyhow needs to set up a separate computer.

My point is just that by storing all this information (which is not considered part of the provenance) in the database, we are forcing ourselves to re-implement all this logic that is already present in the ssh program itself; while resulting in a less feature-complete and less robust interface for the user. For example, I just came across another confusing behavior, where the gotocomputer command does not make use of the Proxycommand set for the computer: it uses only the configured hostname and private key (i.e. to make it work one then has to modify the ~/.ssh/config). This is really non-obvious.

If one could simply tell AiiDA the ssh alias to connect to and let the ssh config do the rest [1], it seems to me that we would make life easier both for users and developers.

[1] Either by using ssh directly like in gotocomputer or by getting the paramiko connection via some dedicated library like fabric that does the parsing for us.

giovannipizzi commented 3 years ago

Assuming we can use all features in the SSH config file (not obvious, probably needs fabric, which means adapting most of the transport) we could go for a transition phase where the first question is "do you want to use your user SSH config" and if yes you just provide a name (note, one downside: you might connect to a different machine than the one declared in the hostname of the computer, in this way).

Or, even better (if as I suspect this requires fabric): one should implement a ssh-fabric plugin and use this approach there. No problem in hosting it directly on aiida-core, I think, and if we realise it's much more robust we can suggest it as a default. Feel free to give a shot to it if you want to test if the idea would work!

giovannipizzi commented 3 years ago

I also mention here the option to look into AsyncSSH as mentioned by @dev-zero in this comment. If one wants to look into this, I suggest implementing a new asyncssh transport plugin first.

mbercx commented 3 years ago

In general, I fully support @ltalirz's suggestion to move towards simply storing the Host in the database, and having ssh libraries deal with the configuration. Would make life a lot easier for both beginning users and ourselves. This would indeed involve some substantial changes, and it may be better to write all of this down in AEP.

To deal with the problem raised in this issue, I was playing around with adapting the transports.plugins.ssh.parse_sshconfig() method:

def parse_sshconfig(hostname):
    """
    Parse the ``.ssh/config`` file in the home directory and return each configuration that has the given hostname.

    :param hostname: the hostname for which we want the configuration.
    :return: dict of configurations mapping each matching ``Host`` with the specified ``hostname`` to its configuration.
    """
    import paramiko
    config = paramiko.SSHConfig()
    try:
        with open(os.path.expanduser('~/.ssh/config'), encoding='utf8') as fhandle:
            config.parse(fhandle)
    except IOError:
        # No file found, so empty configuration
        pass

    matching_hosts = {}

    for host in config.get_hostnames():
        host_config = config.lookup(host)
        if host_config['hostname'] == hostname:
            matching_hosts[host] = host_config

    return matching_hosts

In an effort to deal with the possibility of multiple Hosts with the same HostName, I look up all Hosts with the corresponding HostName and return a dictionary instead of a single configuration. This will still require further changes, though. The _get_{}_suggestion_string()methods will probably need to be adapted to return this dict instead of a str, but I'm not sure where sn the transports.cli module we would add the possibility for the user to select which Host to use. The verdi computer show command also uses the transport_option_default() method, so we'll have to be careful not to mess up this command as well.

This problem would be avoided if we could set up the parsing of the .ssh/config based on the Host instead of the HostName. I think paramiko deals with wildcards as well, at least for this example config:

Host *
  ForwardX11 yes
Host daint
  HostName daint.cscs.ch
  User mbercx
  IdentityFile ~/.ssh/id_rsa-daint
  ProxyCommand ssh -W %h:%p -i ~/.ssh/id_rsa-daint mbercx@ela.cscs.ch
In [1]: import os, paramiko

In [2]: config = paramiko.SSHConfig()
   ...: with open(os.path.expanduser('~/.ssh/config'), encoding='utf8') as fhandle:
   ...:     config.parse(fhandle)
   ...: 

In [3]: config.lookup('daint')
Out[3]: 
{'forwardx11': 'yes',
 'hostname': 'daint.cscs.ch',
 'user': 'mbercx',
 'identityfile': ['/home/mbercx/.ssh/id_rsa-daint'],
 'proxycommand': 'ssh -W daint.cscs.ch:22 -i /home/mbercx/.ssh/id_rsa-daint mbercx@ela.cscs.ch'}

Or is there some other issue I'm missing?

To deal with this issue in a minimal way, perhaps we can add an option to verdi computer configure ssh to load the ssh config from the .ssh/config file based on a specified Host. Then the user could just do e.g.:

verdi computer configure ssh --load-from-system-host daint

However, I still think that @ltalirz's suggestion is the best solution long term, if we can hash out @giovannipizzi's concerns. I.e. instead of using a hostname during computer setup, we just set a host, and if the transport is ssh and this host is present in the .ssh/config file, the user doesn't need to do the configure step. If it isn't present, either the command fails or the user is warned that the Host is missing from the .ssh/config file.

dev-zero commented 3 years ago

ProxyJump support is here: #4951

ConradJohnston commented 3 years ago

ProxyJump support is here: #4951

Heroic stuff, Tiziano! 🥇