cea-hpc / clustershell

Scalable cluster administration Python framework — Manage node sets, node groups and execute commands on cluster nodes in parallel.
https://clustershell.readthedocs.io/
425 stars 85 forks source link

ssh options settable in environment like pdsh's PDSH_SSH_ARGS_APPEND #408

Open brianjmurrell opened 5 years ago

brianjmurrell commented 5 years ago

With pdsh, I can set PDSH_SSH_ARGS_APPEND to some ssh command-line options ahead of doing a bunch of pdsh commands to avoid having to repeat those options on every pdsh invocation.

I'm not very well versed in the source of clush and clustershell yet but I am not seeing any equivalent for clush.

degremont commented 5 years ago

Correct me if I'm wrong but my understanding of this pdsh feature is to add a way to add some specific ssh arguments which is not possible otherwise. Because pdsh does not offer a way to do that on command-line or through config files like clustershell does.

Is the inherent nature of an environment variable what you are looking for?

If you don't want to touch your configuration, could an alias be enough for you, like:

alias clush="clush -o'-i myprivatekey'"
brianjmurrell commented 5 years ago

While I understand the concept, I don't really take advantage of aliases that much so am quite ignorant about them.

Can they be set in one shell script and survive into another shell script? My testing here doesn't seem to bear that out:

a.sh:

#!/bin/bash
alias clush="foobar -o'-i myprivatekey'"
./b.sh

b.sh:

#!/bin/bash
clush -S -w vm1 id
$ ./a.sh
vm1: uid=1000(vagrant) gid=1000(vagrant) groups=1000(vagrant)

should have produced a bash: foobar: command not found

volans- commented 5 years ago

It's not fully clear to me what's your use case. In particular if those settings are per-host or per-run and if they are persistent for some time or change at runtime.

Some general suggestions:

brianjmurrell commented 5 years ago

Sure, those are options, but none as simple and transparent as $PDSH_SSH_ARGS_APPEND because it means to convert from pdsh I have to go code up/fix up every invocation of clush to utilise additional command-line args instead of just changing the one PDSH_SSH_ARGS_APPEND=... to CLUSH_ARGS_APPEND=....

I guess I was hoping for a more "drop-in" replacement.

Now that I know of this limitation, I guess this can be closed, unless you want to leave it open to implement such an environment variable, for more transparent drop-in replacing of pdsh.

degremont commented 5 years ago

This patch should do what you want, but that's true we don't have this kind of feature right now

diff --git a/lib/ClusterShell/CLI/Clush.py b/lib/ClusterShell/CLI/Clush.py
index 2cb94d7..1922dcf 100755
--- a/lib/ClusterShell/CLI/Clush.py
+++ b/lib/ClusterShell/CLI/Clush.py
@@ -800,7 +800,11 @@ def main():
     parser.install_filecopy_options()
     parser.install_connector_options()

-    (options, args) = parser.parse_args()
+    args = sys.argv[1:]
+    if 'CLUSH_ARGS' in os.environ:
+        args.insert(0, os.environ['CLUSH_ARGS'])
+
+    (options, args) = parser.parse_args(args=args)

     set_std_group_resolver_config(options.groupsconf)

Out of curiosity, what's the motivation for you to move from pdsh to ClusterShell

brianjmurrell commented 5 years ago

This patch should do what you want

I don't think it does. What I was looking for was an environment variable that substitutes clush's -o option. So for example, instead of doing:

$ clush -o '-i my_id' ...

I could do

$ export CLUSH_SSH_ARGS="-i my_id"
$ clush ...

and achieve the same result.

While I do agree that your patch could achieve the same thing with:

$ export CLUSH_ARGS="-o \"-i my_id\""
$ clush ...

I'm still in quoting/escaping hell. At a single level, it's most certainly manageable (I do that all day every day), but when you are trying to escape quotes through multiple levels and (of particular pain) languages (i.e. groovy calling a shell script, which calls other shell scripts which can call other shell scripts) it gets quite hairy.

Out of curiosity, what's the motivation for you to move from pdsh to ClusterShell

A few reasons. We are writing lots of python code that needs pdsh/ClusterShell parallel execution capabilities. In avoiding subprocess calls to pdsh we quickly found ourselves re-inventing ClusterShell.

If we are going to use ClusterShell at the python level, we might as well take advantage of it at the shell level and gain the benefits it brings. The claims[1] of speed are one. The -b/-B argument replacements for dshbak -c are another. That it's written in python is another. Support for multiple ranges (foo-[1-3]vm[1-3]) in host globs is yet another.

So don't get me wrong. I like ClusterShell and there are many good reasons to switch to it. I'm just finding a place here or there (like in this ticket) where pdsh was giving us something that ClusterShell isn't and I'm finding the particular missing thing painful.

[1] I don't doubt the claims, I just have not benchmarked it for myself so don't want to make an assertion here that I have not verified myself.

degremont commented 5 years ago

While I do agree that your patch could achieve the same thing with:

$ export CLUSH_ARGS="-o \"-i my_id\"" $ clush ...

I'm still in quoting/escaping hell. At a single level, it's most certainly manageable (I do that all day every day), but when you are trying to escape quotes through multiple levels and (of particular pain) languages (i.e. groovy calling a shell script, which calls other shell scripts which can call other shell scripts) it gets quite hairy.

This is a generic issue. export CLUSH_ARGS="-o '-i my_id'" will likely be easier for you in this specific case. (You can even do export CLUSH_ARGS="-o-imy_id", no space needed, but this is even more specific)

But you can even have if this is limited to ssh, like SSH_APPEND="-o ProxyCommand='whatever command'". If we do that, it is definitely a good idea to support any argument in this environment variable and not only ssh one.

Nice to see you see lots of value in CS! :)