Closed mih closed 1 year ago
Tests cannot run ATM, because of some certificate expiration issue on the windows working images of appveyor. Time will fix it, I guess.
I have just a preliminary observation: The test in this PR here doesn't run through on my Windows machine.
The patch is applied ([DEBUG] Apply patch to datalad.support.sshconnector._exec_ssh
), but it stops doing anything after the DEBUG message "Estimating digests for ...", and I need to kill the Python process in the task manager.
(py39) C:\Users\adina\repos\datalad-ria>pytest -s -v datalad_ria\tests\test_ssh_remote_io.py
[DEBUG] Enable posting DataLad config overrides CLI/ENV as GIT_CONFIG items in process ENV
[DEBUG] Not retro-fitting GitRepo with deprecated symbols, datalad-deprecated package not found
[DEBUG] Apply datalad-next patch to annexrepo.py:AnnexRepo.enable_remote
[DEBUG] Building doc for <class 'datalad.local.configuration.Configuration'>
[DEBUG] Building doc for <class 'datalad_next.patches.configuration.Configuration'>
[DEBUG] Building doc for <class 'datalad.local.configuration.Configuration'>
[DEBUG] Failed to import requests_ftp, thus no ftp support: ModuleNotFoundError(No module named 'requests_ftp')
[DEBUG] Apply datalad-next patch to create_sibling_ghlike.py:_GitHubLike._set_request_headers
[DEBUG] Apply datalad-next patch to interface.(utils|base).py:_execute_command_
[DEBUG] Building doc for <class 'datalad.core.local.status.Status'>
[DEBUG] Building doc for <class 'datalad.core.local.diff.Diff'>
[DEBUG] Building doc for <class 'datalad.core.distributed.push.Push'>
[DEBUG] Apply patch to datalad.core.distributed.push._transfer_data
[DEBUG] Patching datalad.core.distributed.push.Push docstring and parameters
[DEBUG] Building doc for <class 'datalad.core.distributed.push.Push'>
[DEBUG] Patching datalad.support.AnnexRepo.get_export_records (new method)
[DEBUG] Apply patch to datalad.core.distributed.push._push
[DEBUG] Apply patch to datalad.distribution.siblings._enable_remote
[DEBUG] Building doc for <class 'datalad.distribution.update.Update'>
[DEBUG] Building doc for <class 'datalad.distribution.siblings.Siblings'>
[DEBUG] Replace special remote _main() with datalad-next's progress logging enabled variant
[DEBUG] Building doc for <class 'datalad.local.subdatasets.Subdatasets'>
[DEBUG] Building doc for <class 'datalad.distributed.create_sibling_gitlab.CreateSiblingGitlab'>
[DEBUG] Apply patch to datalad.distributed.create_sibling_gitlab._proc_dataset
[DEBUG] Stop advertising discontinued "hierarchy" layout for `create_siblign_gitlab()`
[DEBUG] Building doc for <class 'datalad.distributed.create_sibling_gitlab.CreateSiblingGitlab'>
[DEBUG] Apply patch to datalad.support.sshconnector._exec_ssh
[DEBUG] Run ['git', 'version'] (protocol_class=StdOutErrCapture) (cwd=None)
[DEBUG] Finished ['git', 'version'] with status 0
[DEBUG] Run ['git', 'annex', 'version', '--raw'] (protocol_class=StdOutErrCapture) (cwd=None)
[DEBUG] Finished ['git', 'annex', 'version', '--raw'] with status 0
[DEBUG] Created temporary directory named C:\Users\adina\AppData\Local\Temp\datalad_temp_iq2vzp3q
[DEBUG] Filename ' ' is not supported on Windows under C:\Users\adina\AppData\Local\Temp\datalad_temp_iq2vzp3q
[DEBUG] Filename '/' is not supported on Windows under C:\Users\adina\AppData\Local\Temp\datalad_temp_iq2vzp3q
[DEBUG] Filename '|' is not supported on Windows under C:\Users\adina\AppData\Local\Temp\datalad_temp_iq2vzp3q
[DEBUG] Filename ';&%b5{}\'"' is not supported on Windows under C:\Users\adina\AppData\Local\Temp\datalad_temp_iq2vzp3q
[DEBUG] Filename ";&%b5{}'<" is not supported on Windows under C:\Users\adina\AppData\Local\Temp\datalad_temp_iq2vzp3q
[DEBUG] Filename ";&%b5{}'>" is not supported on Windows under C:\Users\adina\AppData\Local\Temp\datalad_temp_iq2vzp3q
[DEBUG] Filename ";&%b5{}' " is not supported on Windows under C:\Users\adina\AppData\Local\Temp\datalad_temp_iq2vzp3q
[DEBUG] Filename ";&%b5{}'.datc " is not supported on Windows under C:\Users\adina\AppData\Local\Temp\datalad_temp_iq2vzp3q
[DEBUG] Tested 14 obscure filename candidates. The winner: ";&%b5{}'.datc"
[DEBUG] Processing entrypoints
[DEBUG] Done processing entrypoints
[DEBUG] Building doc for <class 'datalad.core.local.create.Create'>
[DEBUG] Building doc for <class 'datalad.core.local.save.Save'>
[DEBUG] Building doc for <class 'datalad.core.distributed.clone.Clone'>
[DEBUG] Building doc for <class 'datalad.distribution.get.Get'>
[DEBUG] Building doc for <class 'datalad.distribution.install.Install'>
[DEBUG] Building doc for <class 'datalad.local.unlock.Unlock'>
[DEBUG] Building doc for <class 'datalad.core.local.run.Run'>
[DEBUG] Building doc for <class 'datalad.distributed.create_sibling_github.CreateSiblingGithub'>
[DEBUG] Building doc for <class 'datalad.distributed.create_sibling_gogs.CreateSiblingGogs'>
[DEBUG] Building doc for <class 'datalad.distributed.create_sibling_gin.CreateSiblingGin'>
[DEBUG] Building doc for <class 'datalad.distributed.create_sibling_gitea.CreateSiblingGitea'>
[DEBUG] Building doc for <class 'datalad.distributed.create_sibling_ria.CreateSiblingRia'>
[DEBUG] Building doc for <class 'datalad.distribution.create_sibling.CreateSibling'>
[DEBUG] Building doc for <class 'datalad.distributed.drop.Drop'>
[DEBUG] Building doc for <class 'datalad.local.remove.Remove'>
[DEBUG] Building doc for <class 'datalad.local.addurls.Addurls'>
[DEBUG] Building doc for <class 'datalad.local.copy_file.CopyFile'>
[DEBUG] Building doc for <class 'datalad.local.download_url.DownloadURL'>
[DEBUG] Building doc for <class 'datalad.local.foreach_dataset.ForEachDataset'>
[DEBUG] Building doc for <class 'datalad.local.rerun.Rerun'>
[DEBUG] Building doc for <class 'datalad.local.run_procedure.RunProcedure'>
[DEBUG] Building doc for <class 'datalad.local.wtf.WTF'>
[DEBUG] Building doc for <class 'datalad.local.clean.Clean'>
[DEBUG] Building doc for <class 'datalad.local.add_archive_content.AddArchiveContent'>
[DEBUG] Building doc for <class 'datalad.local.add_readme.AddReadme'>
[DEBUG] Building doc for <class 'datalad.local.export_archive.ExportArchive'>
[DEBUG] Building doc for <class 'datalad.distributed.export_archive_ora.ExportArchiveORA'>
[DEBUG] Building doc for <class 'datalad.distributed.export_to_figshare.ExportToFigshare'>
[DEBUG] Building doc for <class 'datalad.local.no_annex.NoAnnex'>
[DEBUG] Building doc for <class 'datalad.local.check_dates.CheckDates'>
[DEBUG] Building doc for <class 'datalad.distribution.uninstall.Uninstall'>
[DEBUG] Building doc for <class 'datalad.distribution.create_test_dataset.CreateTestDataset'>
[DEBUG] Building doc for <class 'datalad.support.sshrun.SSHRun'>
[DEBUG] Building doc for <class 'datalad.interface.shell_completion.ShellCompletion'>
[DEBUG] Processing entrypoints
[DEBUG] Loading entrypoint ria from datalad.extensions
[DEBUG] Building doc for <class 'datalad_ria.create_sibling_ria.CreateSiblingRia'>
[DEBUG] Loaded entrypoint ria from datalad.extensions
[DEBUG] Loading entrypoint next from datalad.extensions
[DEBUG] Building doc for <class 'datalad_next.commands.credentials.Credentials'>
[DEBUG] Building doc for <class 'datalad_next.commands.create_sibling_webdav.CreateSiblingWebDAV'>
[DEBUG] Building doc for <class 'datalad_next.commands.tree.TreeCommand'>
[DEBUG] Building doc for <class 'datalad_next.commands.download.Download'>
[DEBUG] Building doc for <class 'datalad_next.commands.ls_file_collection.LsFileCollection'>
[DEBUG] Loaded entrypoint next from datalad.extensions
[DEBUG] Done processing entrypoints
=================================================================== test session starts ====================================================================
platform win32 -- Python 3.9.0, pytest-7.3.2, pluggy-1.0.0 -- C:\Users\adina\env\py39\Scripts\python.exe
cachedir: .pytest_cache
rootdir: C:\Users\adina\repos\datalad-ria
plugins: cov-4.1.0
collected 1 item
datalad_ria/tests/test_ssh_remote_io.py::test_SSHRemoteIO_basics [DEBUG] Created temporary directory named C:\Users\adina\AppData\Local\Temp\datalad_temp_hz9tvufk
[DEBUG] Run ['git', '--git-dir=b:\\nul', 'config', '-z', '-l', '--show-origin'] (protocol_class=StdOutErrCapture) (cwd=None)
[DEBUG] Finished ['git', '--git-dir=b:\\nul', 'config', '-z', '-l', '--show-origin'] with status 0
[DEBUG] UI set to QuietConsoleLog(out=<TextIOWrapper>)
[DEBUG] Run ['git', 'version'] (protocol_class=StdOutErrCapture) (cwd=None)
[DEBUG] Finished ['git', 'version'] with status 0
[DEBUG] Not retro-fitting GitRepo with deprecated symbols, datalad-deprecated package not found
[DEBUG] Run ['git', 'annex', 'version', '--raw'] (protocol_class=StdOutErrCapture) (cwd=None)
[DEBUG] Finished ['git', 'annex', 'version', '--raw'] with status 0
[DEBUG] Created temporary directory named C:\Users\adina\AppData\Local\Temp\datalad_temp_bnx7xftv
[DEBUG] Filename ' ' is not supported on Windows under C:\Users\adina\AppData\Local\Temp\datalad_temp_bnx7xftv
[DEBUG] Filename '/' is not supported on Windows under C:\Users\adina\AppData\Local\Temp\datalad_temp_bnx7xftv
[DEBUG] Filename '|' is not supported on Windows under C:\Users\adina\AppData\Local\Temp\datalad_temp_bnx7xftv
[DEBUG] Filename ';&%b5{}\'"' is not supported on Windows under C:\Users\adina\AppData\Local\Temp\datalad_temp_bnx7xftv
[DEBUG] Filename ";&%b5{}'<" is not supported on Windows under C:\Users\adina\AppData\Local\Temp\datalad_temp_bnx7xftv
[DEBUG] Filename ";&%b5{}'>" is not supported on Windows under C:\Users\adina\AppData\Local\Temp\datalad_temp_bnx7xftv
[DEBUG] Filename ";&%b5{}' " is not supported on Windows under C:\Users\adina\AppData\Local\Temp\datalad_temp_bnx7xftv
[DEBUG] Filename ";&%b5{}'.datc " is not supported on Windows under C:\Users\adina\AppData\Local\Temp\datalad_temp_bnx7xftv
[DEBUG] Tested 14 obscure filename candidates. The winner: ";&%b5{}'.datc"
[DEBUG] chdir 'C:\\Users\\adina\\repos\\datalad-ria' -> 'C:\\Users\\adina\\AppData\\Local\\Temp\\httpserveu00j3d07'
[DEBUG] HTTP: serving C:\Users\adina\AppData\Local\Temp\httpserveu00j3d07 under http://127.0.0.1:64784/
[DEBUG] Estimating digests for C:\Users\adina\AppData\Local\Temp\datalad_temp_hz9tvufk\.gitconfig (cwd=None) h status 0 with status 0 ow-origin']
In a python session, things look as follows:
>>> ssh = SSHRemoteIO('ssh://sshuser@localhost:2222')
doesn't do anything (seemlingly) BUT I can CTRL-C it in the Python process and reuse the python session afterwards. So its not the same hanging as before.
>>> ssh = SSHRemoteIO('ssh://sshuser@localhost:2222')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\adina\env\py39\lib\site-packages\datalad\distributed\ora_remote.py", line 351, in __init__
line = self.shell.stdout.readline()
import pdb; pdb.set_trace()
349 self.shell.stdin.write(b"echo RIA-REMOTE-LOGIN-END\n")
350 self.shell.stdin.flush()
351 while True:
352 -> line = self.shell.stdout.readline()
353 if line == b"RIA-REMOTE-LOGIN-END\n":
354 break
355 # TODO: Same for stderr?
If I get into this while loop in a debugger, stdout only contains empty lines, so it seems trapped in there.
import pdb; pdb.set_trace() 349 self.shell.stdin.write(b"echo RIA-REMOTE-LOGIN-END\n") 350 self.shell.stdin.flush() 351 while True: 352 -> line = self.shell.stdout.readline() 353 if line == b"RIA-REMOTE-LOGIN-END\n": 354 break 355 # TODO: Same for stderr?
If I get into this while loop in a debugger, stdout only contains empty lines, so it seems trapped in there.
Thanks for tracking that down. readline()
would return ''
to indicate EOF. That happens on my Windows machine, if the ssh-subprocess, in self.shell
, fails to connect to the server. Could that be the case here?
[Edit]
We should definitely check for that condition and probably raise an error, that shows the content of self.ssh.sshri.as_str()
.
Thanks for tracking that down. readline() would return '' to indicate EOF. That happens on my Windows machine, if the ssh-subprocess, in self.shell, fails to connect to the server. Could that be the case here?
I'm getting a few conflicting findings, maybe you can make sense of them:
In [1]: from datalad.distributed.ora_remote import SSHRemoteIO
[DEBUG] Not retro-fitting GitRepo with deprecated symbols, datalad-deprecated package not found
In [2]: SSHRemoteIO('ssh://sshuser@localhost:2222')
initing initialized
obtaining a shell
> c:\users\adina\env\py39\lib\site-packages\datalad\distributed\ora_remote.py(351)__init__()
-> self.shell.stdin.write(b"echo RIA-REMOTE-LOGIN-END\n")
(Pdb) p self.ssh.sshri.as_str()
'sshuser@localhost'
(Pdb) self.ssh
NoMultiplexSSHConnection(sshri=<<SSHRI(hostname++40 chars++er')>>)
(Pdb) self.ssh.is_open()
False
I interpreted the "is_open() -> False" as weird and an indication that there could be a connection issue.
However, through methods like self.ssh.get_git_version
I can send commands to that ssh server (in that specific case, it reports that git is not installed, but as that method doesn't declare the stdin/stdout pipes, the process hangs afterwards).
We should definitely check for that condition and probably raise an error, that shows the content of self.ssh.sshri.as_str().
the cmd
with which subprocess.Popen()
gets called looks suspicious to me because it has lost the port:
(Pdb) p cmd
['ssh', 'sshuser@localhost']
Edit: If I hardcode the port (sshuser@localhost:2222), I still end up in a seemingly endless loop.
the
cmd
with whichsubprocess.Popen()
gets called looks suspicious to me because it has lost the port:(Pdb) p cmd ['ssh', 'sshuser@localhost']
That is definitely wrong. There is some old code, i.e. SSHRI.as_str()
, which seems to be faulty. It is definitely missing a clause to check for port
and some more things (see below). It has to be patched/corrected. If a port is contained in the destination, the destination should read ssh://sshuser@localhost:2222
.
Edit: If I hardcode the port (sshuser@localhost:2222), I still end up in a seemingly endless loop.
Can you try with ssh://sshuser@localhost:2222
? It works on my machine.
If that works, we just have to fix the (IMHO needlessly complicated) SSHRI code (or, if that would break datalad, bundle an updated version with a patched SSHRemoteIO.__init__()
-function). Hopefully that is the last fix in datalad-core ;-)
Edit: If I hardcode the port (sshuser@localhost:2222), I still end up in a seemingly endless loop.
We have to check whether the shell process has returned with a non-zero value in the loop. The self.shell.stdin.write()
might fail if the ssh-process exits, but that is not guaranteed, due to race-conditions. The loop should look something like this:
while True:
status = self.shell.poll()
if status not in (0, None):
raise Exception(f'ssh exited with status {status}') # TODO use proper exception class
line = self.shell.stdout.readline()
if line == '':
raise Exception(f'ssh closed stdout unexpectedly')
if line == b"RIA-REMOTE-LOGIN-END\n":
break
That should leave the loop if any errors occur.
I am preparing a commit, that includes the necessary patches. Let's see how that works.
re SSHRI
and friends: My preference would be to drop all that code entirely (and eventually). SSH is perfectly capable of working with URLs, our URLOperations
framework works with URLs, a user provides URLs. Any back and forth conversion to these RI
seems needlessly complicated, and serves little purpose.
general note re "connection open" etc.: Please keep in mind that on Windows SSH does not do connection multiplexing. This means that there is never an open connection, and all this code was likely written with the assumption that such oddity cannot exist. I suspect that these flags have little meaning on windows and they are coded up to make-pretend and satisfy other code's assumptions on how the world should behave (but isn't).
I have extracted everything from this PR that is not narrowly focused on SSHRemoteIO
into separate PRs:
They have been merged into main
Patch coverage: 94.11%
and project coverage change: -0.60%
:warning:
Comparison is base (
c93ece8
) 97.53% compared to head (a102c53
) 96.93%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
This is looking good now! I left a comment re documenting the path forward. But otherwise we should get this merged!
It would be good to rebase this branch, and squash the history to the relevant bits. Likely just two commits:
WDYT?
See if the whole shebang of identityfile, custom username, with custom port still yields a working IO abstraction.
This changeset includes a patch to datalad-core:
This patch aims to fix a hanging Python sessions after the execution of an SSH remote command call with no particular stdin input.
Interpretation from #68
This patch passes an explicit
b''
asstdin
to the SSH client execution process to effectively achieve a separate fiel descriptor for that client process.This patch should not interfere with the implementation of the
sshrun
command in datalad-core. It uses a dedicated not-None value for any execution. However, the compatibility and interference of this patch should be subject to a thorough investigation and widespread testing before this changeset is proposed for datalad-core.Closes #68