datalad / datalad-ria

Adds functionality for RIA stores to DataLad
http://datalad.org
Other
0 stars 1 forks source link

Problematic interaction of different SSH logins with user configs #96

Open mih opened 1 year ago

mih commented 1 year ago

Here is the pattern:

Pdb) ssh_remoteio.get_7z()
False
(Pdb) ssh_remoteio._run('7z', check=True)
*** datalad.distributed.ora_remote.RemoteCommandFailedError: 7z failed:
(Pdb) ssh_remoteio._run('which 7z', check=True)
*** datalad.distributed.ora_remote.RemoteCommandFailedError: which 7z failed:

but

(Pdb) ssh_remoteio.ssh('7z')
('\n7-Zip [64] 17.04 : Copyright (c) 1999-2021 Igor Pavlov : 2017-08-28\np7zip Version 17.04 (locale=utf8,Utf16=on,HugeFiles=on,64 bits,2 CPUs x64)\n\nUsage: 7z <command> [<switches>...] <archive_name> ...
(Pdb) ssh_remoteio.ssh('which 7z')
('/usr/local/bin/7z\n', '')

This smells like the execution environment is different between the persistent shell and the direct SSH-execution. And indeed:

(Pdb) ssh_remoteio.ssh('echo $PATH')
('/Users/appveyor/.nvm/versions/node/v18.12.1/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/Users/appveyor/.rvm/bin\n', '')

(Pdb) ssh_remoteio._run('echo $PATH', check=True)
'/usr/bin:/bin:/usr/sbin:/sbin\n'

The persistent shell is more restricted.

adswa commented 1 year ago

here is a comparison when I run the container locally on linux:


(Pdb) ssh_remoteio.ssh('echo $PATH')
('/usr/local/bin:/usr/bin:/bin:/usr/games\n', 'bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)\n')
(Pdb) ssh_remoteio._run('echo $PATH', check=True)
'/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games\n'
``
adswa commented 1 year ago

I checked whether it might be a difference in zsh versus bash (because the mac worker's localhost uses zsh as a default), but the path variables on both are mostly identical

adswa commented 1 year ago
(Pdb) ssh_remoteio.ssh('env')
(
'NVM_CD_FLAGS=\n
SHELL=/bin/bash\n
TMPDIR=/var/folders/5s/g225f6nd6jl4g8tshbh1ltk40000gn/T/\n
SSH_CLIENT=127.0.0.1 49669 22\n
LC_ALL=en_US.UTF-8\n
NVM_DIR=/Users/appveyor/.nvm\n
USER=appveyor\n
PATH=/Users/appveyor/.nvm/versions/node/v18.12.1/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/Users/appveyor/.rvm/bin\n
PWD=/Users/appveyor\n
LANG=en_US.UTF-8\n
SHLVL=1\n
HOME=/Users/appveyor\n
LOGNAME=appveyor\n
SSH_CONNECTION=127.0.0.1 49669 127.0.0.1 22\n
LC_CTYPE=en_US.UTF-8\n
NVM_BIN=/Users/appveyor/.nvm/versions/node/v18.12.1/bin\n
_=/usr/bin/env\n', '')

(Pdb) ssh_remoteio._run('env')
*** datalad.distributed.ora_remote.RIARemoteError: env && printf '\n%s\n' 'ora-remote: end - ok' || printf '\n%s\n' 'ora-remote: end - fail'\n: 
SHELL=/bin/bash\n
TMPDIR=/var/folders/5s/g225f6nd6jl4g8tshbh1ltk40000gn/T/\n
SSH_CLIENT=127.0.0.1 49669 22\n
LC_ALL=en_US.UTF-8\n
USER=appveyor\n
PATH=/usr/bin:/bin:/usr/sbin:/sbin\n
PWD=/Users/appveyor\n
LANG=en_US.UTF-8\n
SHLVL=1\n
HOME=/Users/appveyor\n
LOGNAME=appveyor\n
SSH_CONNECTION=127.0.0.1 49669 127.0.0.1 22\n
LC_CTYPE=en_US.UTF-8\n
_=/usr/bin/env\n

It looks like the path is the only environment variable with a different value. And the NVM_CD_FLAGS, NVM_DIR, NVM_BIN variables are set in the first, but not the second one.

adswa commented 1 year ago

ah, the relevant element of the PATH (usr/local/bin) gets added when the .bashrc file get sourced:

(venv3.8.15) worker-629-001:datalad-ria appveyor$ cat ~/.bashrc
export PATH="$PATH:/usr/local/bin"

# Add RVM to PATH for scripting. Make sure this is the last PATH variable change.
export PATH="$PATH:$HOME/.rvm/bin"

export NVM_DIR="$HOME/.nvm"
[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh"  # This loads nvm
[ -s "$NVM_DIR/bash_completion" ] && \. "$NVM_DIR/bash_completion"  # This loads nvm bash_completion

I'm not sure what would be the right fix. I think the persistent shell opens a login shell, which wouldn't source .bashrc according to the ssh man page. The other connection executes a command, which does not yield a login shell and sources .bashrc, thus getting the necessary PATH amendment.

adswa commented 1 year ago

If my assessment is correct, this problem isn't specific to mac or appveyor. ... and probably not even SSHRemoteIO._run() - just a consequence of the different ways that SSH connection is build. It would show up on any system with a configuration in bashrc that affects relevant behavior.

adswa commented 1 year ago

@christian-monch, I recalled that you were working on an SSHShell implementation (https://github.com/datalad/datalad-ria/blob/e3c2acdadbd42824cb97400883100752c4941ea9/datalad_ria/sshshell.py). I was wondering if this could help resolve this issue here. If we can use it for both persistent shell usecases and single-command invocations, we at least would not have potentially different behavior anymore