datalad / datalad-ria

Adds functionality for RIA stores to DataLad
http://datalad.org
Other
0 stars 1 forks source link

First crude smoke test for SSHRemoteIO #69

Closed mih closed 9 months ago

mih commented 10 months ago

See if the whole shebang of identityfile, custom username, with custom port still yields a working IO abstraction.

This changeset includes a patch to datalad-core:

This patch aims to fix a hanging Python sessions after the execution of an SSH remote command call with no particular stdin input.

Interpretation from #68

Without overriding stdin, the subprocess, i.e. ssh, and python share
the same file pointer. It seems that stdin is configured in a way that
unexpected by the interpreter and messes with python's way to read from
sys.stdin.

This patch passes an explicit b'' as stdin to the SSH client execution process to effectively achieve a separate fiel descriptor for that client process.

This patch should not interfere with the implementation of the sshrun command in datalad-core. It uses a dedicated not-None value for any execution. However, the compatibility and interference of this patch should be subject to a thorough investigation and widespread testing before this changeset is proposed for datalad-core.

Closes #68

mih commented 9 months ago

Tests cannot run ATM, because of some certificate expiration issue on the windows working images of appveyor. Time will fix it, I guess.

adswa commented 9 months ago

I have just a preliminary observation: The test in this PR here doesn't run through on my Windows machine.

The patch is applied ([DEBUG] Apply patch to datalad.support.sshconnector._exec_ssh), but it stops doing anything after the DEBUG message "Estimating digests for ...", and I need to kill the Python process in the task manager.

(py39) C:\Users\adina\repos\datalad-ria>pytest -s -v datalad_ria\tests\test_ssh_remote_io.py
[DEBUG] Enable posting DataLad config overrides CLI/ENV as GIT_CONFIG items in process ENV
[DEBUG] Not retro-fitting GitRepo with deprecated symbols, datalad-deprecated package not found
[DEBUG] Apply datalad-next patch to annexrepo.py:AnnexRepo.enable_remote
[DEBUG] Building doc for <class 'datalad.local.configuration.Configuration'>
[DEBUG] Building doc for <class 'datalad_next.patches.configuration.Configuration'>
[DEBUG] Building doc for <class 'datalad.local.configuration.Configuration'>
[DEBUG] Failed to import requests_ftp, thus no ftp support: ModuleNotFoundError(No module named 'requests_ftp')
[DEBUG] Apply datalad-next patch to create_sibling_ghlike.py:_GitHubLike._set_request_headers
[DEBUG] Apply datalad-next patch to interface.(utils|base).py:_execute_command_
[DEBUG] Building doc for <class 'datalad.core.local.status.Status'>
[DEBUG] Building doc for <class 'datalad.core.local.diff.Diff'>
[DEBUG] Building doc for <class 'datalad.core.distributed.push.Push'>
[DEBUG] Apply patch to datalad.core.distributed.push._transfer_data
[DEBUG] Patching datalad.core.distributed.push.Push docstring and parameters
[DEBUG] Building doc for <class 'datalad.core.distributed.push.Push'>
[DEBUG] Patching datalad.support.AnnexRepo.get_export_records (new method)
[DEBUG] Apply patch to datalad.core.distributed.push._push
[DEBUG] Apply patch to datalad.distribution.siblings._enable_remote
[DEBUG] Building doc for <class 'datalad.distribution.update.Update'>
[DEBUG] Building doc for <class 'datalad.distribution.siblings.Siblings'>
[DEBUG] Replace special remote _main() with datalad-next's progress logging enabled variant
[DEBUG] Building doc for <class 'datalad.local.subdatasets.Subdatasets'>
[DEBUG] Building doc for <class 'datalad.distributed.create_sibling_gitlab.CreateSiblingGitlab'>
[DEBUG] Apply patch to datalad.distributed.create_sibling_gitlab._proc_dataset
[DEBUG] Stop advertising discontinued "hierarchy" layout for `create_siblign_gitlab()`
[DEBUG] Building doc for <class 'datalad.distributed.create_sibling_gitlab.CreateSiblingGitlab'>
[DEBUG] Apply patch to datalad.support.sshconnector._exec_ssh
[DEBUG] Run ['git', 'version'] (protocol_class=StdOutErrCapture) (cwd=None)
[DEBUG] Finished ['git', 'version'] with status 0
[DEBUG] Run ['git', 'annex', 'version', '--raw'] (protocol_class=StdOutErrCapture) (cwd=None)
[DEBUG] Finished ['git', 'annex', 'version', '--raw'] with status 0
[DEBUG] Created temporary directory named C:\Users\adina\AppData\Local\Temp\datalad_temp_iq2vzp3q
[DEBUG] Filename ' ' is not supported on Windows under C:\Users\adina\AppData\Local\Temp\datalad_temp_iq2vzp3q
[DEBUG] Filename '/' is not supported on Windows under C:\Users\adina\AppData\Local\Temp\datalad_temp_iq2vzp3q
[DEBUG] Filename '|' is not supported on Windows under C:\Users\adina\AppData\Local\Temp\datalad_temp_iq2vzp3q
[DEBUG] Filename ';&%b5{}\'"' is not supported on Windows under C:\Users\adina\AppData\Local\Temp\datalad_temp_iq2vzp3q
[DEBUG] Filename ";&%b5{}'<" is not supported on Windows under C:\Users\adina\AppData\Local\Temp\datalad_temp_iq2vzp3q
[DEBUG] Filename ";&%b5{}'>" is not supported on Windows under C:\Users\adina\AppData\Local\Temp\datalad_temp_iq2vzp3q
[DEBUG] Filename ";&%b5{}' " is not supported on Windows under C:\Users\adina\AppData\Local\Temp\datalad_temp_iq2vzp3q
[DEBUG] Filename ";&%b5{}'.datc " is not supported on Windows under C:\Users\adina\AppData\Local\Temp\datalad_temp_iq2vzp3q
[DEBUG] Tested 14 obscure filename candidates. The winner: ";&%b5{}'.datc"
[DEBUG] Processing entrypoints
[DEBUG] Done processing entrypoints
[DEBUG] Building doc for <class 'datalad.core.local.create.Create'>
[DEBUG] Building doc for <class 'datalad.core.local.save.Save'>
[DEBUG] Building doc for <class 'datalad.core.distributed.clone.Clone'>
[DEBUG] Building doc for <class 'datalad.distribution.get.Get'>
[DEBUG] Building doc for <class 'datalad.distribution.install.Install'>
[DEBUG] Building doc for <class 'datalad.local.unlock.Unlock'>
[DEBUG] Building doc for <class 'datalad.core.local.run.Run'>
[DEBUG] Building doc for <class 'datalad.distributed.create_sibling_github.CreateSiblingGithub'>
[DEBUG] Building doc for <class 'datalad.distributed.create_sibling_gogs.CreateSiblingGogs'>
[DEBUG] Building doc for <class 'datalad.distributed.create_sibling_gin.CreateSiblingGin'>
[DEBUG] Building doc for <class 'datalad.distributed.create_sibling_gitea.CreateSiblingGitea'>
[DEBUG] Building doc for <class 'datalad.distributed.create_sibling_ria.CreateSiblingRia'>
[DEBUG] Building doc for <class 'datalad.distribution.create_sibling.CreateSibling'>
[DEBUG] Building doc for <class 'datalad.distributed.drop.Drop'>
[DEBUG] Building doc for <class 'datalad.local.remove.Remove'>
[DEBUG] Building doc for <class 'datalad.local.addurls.Addurls'>
[DEBUG] Building doc for <class 'datalad.local.copy_file.CopyFile'>
[DEBUG] Building doc for <class 'datalad.local.download_url.DownloadURL'>
[DEBUG] Building doc for <class 'datalad.local.foreach_dataset.ForEachDataset'>
[DEBUG] Building doc for <class 'datalad.local.rerun.Rerun'>
[DEBUG] Building doc for <class 'datalad.local.run_procedure.RunProcedure'>
[DEBUG] Building doc for <class 'datalad.local.wtf.WTF'>
[DEBUG] Building doc for <class 'datalad.local.clean.Clean'>
[DEBUG] Building doc for <class 'datalad.local.add_archive_content.AddArchiveContent'>
[DEBUG] Building doc for <class 'datalad.local.add_readme.AddReadme'>
[DEBUG] Building doc for <class 'datalad.local.export_archive.ExportArchive'>
[DEBUG] Building doc for <class 'datalad.distributed.export_archive_ora.ExportArchiveORA'>
[DEBUG] Building doc for <class 'datalad.distributed.export_to_figshare.ExportToFigshare'>
[DEBUG] Building doc for <class 'datalad.local.no_annex.NoAnnex'>
[DEBUG] Building doc for <class 'datalad.local.check_dates.CheckDates'>
[DEBUG] Building doc for <class 'datalad.distribution.uninstall.Uninstall'>
[DEBUG] Building doc for <class 'datalad.distribution.create_test_dataset.CreateTestDataset'>
[DEBUG] Building doc for <class 'datalad.support.sshrun.SSHRun'>
[DEBUG] Building doc for <class 'datalad.interface.shell_completion.ShellCompletion'>
[DEBUG] Processing entrypoints
[DEBUG] Loading entrypoint ria from datalad.extensions
[DEBUG] Building doc for <class 'datalad_ria.create_sibling_ria.CreateSiblingRia'>
[DEBUG] Loaded entrypoint ria from datalad.extensions
[DEBUG] Loading entrypoint next from datalad.extensions
[DEBUG] Building doc for <class 'datalad_next.commands.credentials.Credentials'>
[DEBUG] Building doc for <class 'datalad_next.commands.create_sibling_webdav.CreateSiblingWebDAV'>
[DEBUG] Building doc for <class 'datalad_next.commands.tree.TreeCommand'>
[DEBUG] Building doc for <class 'datalad_next.commands.download.Download'>
[DEBUG] Building doc for <class 'datalad_next.commands.ls_file_collection.LsFileCollection'>
[DEBUG] Loaded entrypoint next from datalad.extensions
[DEBUG] Done processing entrypoints
=================================================================== test session starts ====================================================================
platform win32 -- Python 3.9.0, pytest-7.3.2, pluggy-1.0.0 -- C:\Users\adina\env\py39\Scripts\python.exe
cachedir: .pytest_cache
rootdir: C:\Users\adina\repos\datalad-ria
plugins: cov-4.1.0
collected 1 item

datalad_ria/tests/test_ssh_remote_io.py::test_SSHRemoteIO_basics [DEBUG] Created temporary directory named C:\Users\adina\AppData\Local\Temp\datalad_temp_hz9tvufk
[DEBUG] Run ['git', '--git-dir=b:\\nul', 'config', '-z', '-l', '--show-origin'] (protocol_class=StdOutErrCapture) (cwd=None)
[DEBUG] Finished ['git', '--git-dir=b:\\nul', 'config', '-z', '-l', '--show-origin'] with status 0
[DEBUG] UI set to QuietConsoleLog(out=<TextIOWrapper>)
[DEBUG] Run ['git', 'version'] (protocol_class=StdOutErrCapture) (cwd=None)
[DEBUG] Finished ['git', 'version'] with status 0
[DEBUG] Not retro-fitting GitRepo with deprecated symbols, datalad-deprecated package not found
[DEBUG] Run ['git', 'annex', 'version', '--raw'] (protocol_class=StdOutErrCapture) (cwd=None)
[DEBUG] Finished ['git', 'annex', 'version', '--raw'] with status 0
[DEBUG] Created temporary directory named C:\Users\adina\AppData\Local\Temp\datalad_temp_bnx7xftv
[DEBUG] Filename ' ' is not supported on Windows under C:\Users\adina\AppData\Local\Temp\datalad_temp_bnx7xftv
[DEBUG] Filename '/' is not supported on Windows under C:\Users\adina\AppData\Local\Temp\datalad_temp_bnx7xftv
[DEBUG] Filename '|' is not supported on Windows under C:\Users\adina\AppData\Local\Temp\datalad_temp_bnx7xftv
[DEBUG] Filename ';&%b5{}\'"' is not supported on Windows under C:\Users\adina\AppData\Local\Temp\datalad_temp_bnx7xftv
[DEBUG] Filename ";&%b5{}'<" is not supported on Windows under C:\Users\adina\AppData\Local\Temp\datalad_temp_bnx7xftv
[DEBUG] Filename ";&%b5{}'>" is not supported on Windows under C:\Users\adina\AppData\Local\Temp\datalad_temp_bnx7xftv
[DEBUG] Filename ";&%b5{}' " is not supported on Windows under C:\Users\adina\AppData\Local\Temp\datalad_temp_bnx7xftv
[DEBUG] Filename ";&%b5{}'.datc " is not supported on Windows under C:\Users\adina\AppData\Local\Temp\datalad_temp_bnx7xftv
[DEBUG] Tested 14 obscure filename candidates. The winner: ";&%b5{}'.datc"
[DEBUG] chdir 'C:\\Users\\adina\\repos\\datalad-ria' -> 'C:\\Users\\adina\\AppData\\Local\\Temp\\httpserveu00j3d07'
[DEBUG] HTTP: serving C:\Users\adina\AppData\Local\Temp\httpserveu00j3d07 under http://127.0.0.1:64784/
[DEBUG] Estimating digests for C:\Users\adina\AppData\Local\Temp\datalad_temp_hz9tvufk\.gitconfig (cwd=None) h status 0  with status 0 ow-origin']

In a python session, things look as follows:

>>> ssh = SSHRemoteIO('ssh://sshuser@localhost:2222')

doesn't do anything (seemlingly) BUT I can CTRL-C it in the Python process and reuse the python session afterwards. So its not the same hanging as before.

>>> ssh = SSHRemoteIO('ssh://sshuser@localhost:2222')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\adina\env\py39\lib\site-packages\datalad\distributed\ora_remote.py", line 351, in __init__
    line = self.shell.stdout.readline()
adswa commented 9 months ago
            import pdb; pdb.set_trace()
349             self.shell.stdin.write(b"echo RIA-REMOTE-LOGIN-END\n")
350             self.shell.stdin.flush()
351             while True:
352  ->             line = self.shell.stdout.readline()
353                 if line == b"RIA-REMOTE-LOGIN-END\n":
354                     break
355             # TODO: Same for stderr?

If I get into this while loop in a debugger, stdout only contains empty lines, so it seems trapped in there.

christian-monch commented 9 months ago
            import pdb; pdb.set_trace()
349             self.shell.stdin.write(b"echo RIA-REMOTE-LOGIN-END\n")
350             self.shell.stdin.flush()
351             while True:
352  ->             line = self.shell.stdout.readline()
353                 if line == b"RIA-REMOTE-LOGIN-END\n":
354                     break
355             # TODO: Same for stderr?

If I get into this while loop in a debugger, stdout only contains empty lines, so it seems trapped in there.

Thanks for tracking that down. readline() would return '' to indicate EOF. That happens on my Windows machine, if the ssh-subprocess, in self.shell, fails to connect to the server. Could that be the case here?

[Edit]

We should definitely check for that condition and probably raise an error, that shows the content of self.ssh.sshri.as_str().

adswa commented 9 months ago

Thanks for tracking that down. readline() would return '' to indicate EOF. That happens on my Windows machine, if the ssh-subprocess, in self.shell, fails to connect to the server. Could that be the case here?

I'm getting a few conflicting findings, maybe you can make sense of them:


In [1]: from datalad.distributed.ora_remote import SSHRemoteIO
[DEBUG] Not retro-fitting GitRepo with deprecated symbols, datalad-deprecated package not found

In [2]: SSHRemoteIO('ssh://sshuser@localhost:2222')
initing initialized
obtaining a shell
> c:\users\adina\env\py39\lib\site-packages\datalad\distributed\ora_remote.py(351)__init__()
-> self.shell.stdin.write(b"echo RIA-REMOTE-LOGIN-END\n")
(Pdb) p self.ssh.sshri.as_str()
'sshuser@localhost'
(Pdb) self.ssh
NoMultiplexSSHConnection(sshri=<<SSHRI(hostname++40 chars++er')>>)
(Pdb) self.ssh.is_open()
False

I interpreted the "is_open() -> False" as weird and an indication that there could be a connection issue. However, through methods like self.ssh.get_git_version I can send commands to that ssh server (in that specific case, it reports that git is not installed, but as that method doesn't declare the stdin/stdout pipes, the process hangs afterwards).

We should definitely check for that condition and probably raise an error, that shows the content of self.ssh.sshri.as_str().

the cmd with which subprocess.Popen() gets called looks suspicious to me because it has lost the port:

(Pdb) p cmd
['ssh', 'sshuser@localhost']

Edit: If I hardcode the port (sshuser@localhost:2222), I still end up in a seemingly endless loop.

christian-monch commented 9 months ago

the cmd with which subprocess.Popen() gets called looks suspicious to me because it has lost the port:

(Pdb) p cmd
['ssh', 'sshuser@localhost']

That is definitely wrong. There is some old code, i.e. SSHRI.as_str(), which seems to be faulty. It is definitely missing a clause to check for port and some more things (see below). It has to be patched/corrected. If a port is contained in the destination, the destination should read ssh://sshuser@localhost:2222.

Edit: If I hardcode the port (sshuser@localhost:2222), I still end up in a seemingly endless loop.

Can you try with ssh://sshuser@localhost:2222? It works on my machine.

If that works, we just have to fix the (IMHO needlessly complicated) SSHRI code (or, if that would break datalad, bundle an updated version with a patched SSHRemoteIO.__init__()-function). Hopefully that is the last fix in datalad-core ;-)

christian-monch commented 9 months ago

Edit: If I hardcode the port (sshuser@localhost:2222), I still end up in a seemingly endless loop.

We have to check whether the shell process has returned with a non-zero value in the loop. The self.shell.stdin.write() might fail if the ssh-process exits, but that is not guaranteed, due to race-conditions. The loop should look something like this:

        while True:
            status = self.shell.poll()
            if status not in (0, None):
                raise Exception(f'ssh exited with status {status}')     # TODO use proper exception class
            line = self.shell.stdout.readline()
            if line == '':
                raise Exception(f'ssh closed stdout unexpectedly')
            if line == b"RIA-REMOTE-LOGIN-END\n":
                break

That should leave the loop if any errors occur.

christian-monch commented 9 months ago

I am preparing a commit, that includes the necessary patches. Let's see how that works.

mih commented 9 months ago

re SSHRI and friends: My preference would be to drop all that code entirely (and eventually). SSH is perfectly capable of working with URLs, our URLOperations framework works with URLs, a user provides URLs. Any back and forth conversion to these RI seems needlessly complicated, and serves little purpose.

general note re "connection open" etc.: Please keep in mind that on Windows SSH does not do connection multiplexing. This means that there is never an open connection, and all this code was likely written with the assumption that such oddity cannot exist. I suspect that these flags have little meaning on windows and they are coded up to make-pretend and satisfy other code's assumptions on how the world should behave (but isn't).

mih commented 9 months ago

I have extracted everything from this PR that is not narrowly focused on SSHRemoteIO into separate PRs:

They have been merged into main

codecov-commenter commented 9 months ago

Codecov Report

Patch coverage: 94.11% and project coverage change: -0.60% :warning:

Comparison is base (c93ece8) 97.53% compared to head (a102c53) 96.93%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #69 +/- ## ========================================== - Coverage 97.53% 96.93% -0.60% ========================================== Files 13 15 +2 Lines 162 196 +34 ========================================== + Hits 158 190 +32 - Misses 4 6 +2 ``` | [Files Changed](https://app.codecov.io/gh/datalad/datalad-ria/pull/69?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=datalad) | Coverage Δ | | |---|---|---| | [datalad\_ria/patches/enabled.py](https://app.codecov.io/gh/datalad/datalad-ria/pull/69?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=datalad#diff-ZGF0YWxhZF9yaWEvcGF0Y2hlcy9lbmFibGVkLnB5) | `100.00% <ø> (ø)` | | | [datalad\_ria/tests/utils.py](https://app.codecov.io/gh/datalad/datalad-ria/pull/69?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=datalad#diff-ZGF0YWxhZF9yaWEvdGVzdHMvdXRpbHMucHk=) | `100.00% <ø> (ø)` | | | [datalad\_ria/patches/sshremoteio.py](https://app.codecov.io/gh/datalad/datalad-ria/pull/69?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=datalad#diff-ZGF0YWxhZF9yaWEvcGF0Y2hlcy9zc2hyZW1vdGVpby5weQ==) | `93.10% <93.10%> (ø)` | | | [datalad\_ria/tests/test\_ssh\_remote\_io.py](https://app.codecov.io/gh/datalad/datalad-ria/pull/69?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=datalad#diff-ZGF0YWxhZF9yaWEvdGVzdHMvdGVzdF9zc2hfcmVtb3RlX2lvLnB5) | `100.00% <100.00%> (ø)` | |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

mih commented 9 months ago

This is looking good now! I left a comment re documenting the path forward. But otherwise we should get this merged!

It would be good to rebase this branch, and squash the history to the relevant bits. Likely just two commits:

  1. The added test that documents that datalad-core functionality is broken
  2. The patch that makes the test pass

WDYT?