Open jwodder opened 3 years ago
everything was merged and released on datalad end since back then. Could you please re-trigger CI run here @jwodder ?
@yarikoptic CI run triggered.
Mac is still ain't happy:
(default) Waiting for an IP...
Error creating machine: Error in driver during machine creation: Too many retries waiting for SSH to be available. Last error: Maximum number of retries (60) exceeded
Error: Process completed with exit code 1.
ha -- some pass and some fail with e.g.
2021-03-16T21:55:20.5835210Z datalad.support.exceptions.CommandError: CommandError: 'ssh -o ControlPath=/Users/runner/Library/Caches/datalad/sockets/0ead11a7 datalad-test 'export "PATH=/usr/lib/git-annex.linux:$PATH"; mkdir -p /private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/datalad_temp_check_target_ssh_recursivefimncw8k-False'' failed with exitcode 1 [err: 'mkdir: cannot create directory ‘/private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T/datalad_temp_check_target_ssh_recursivefimncw8k-False’: Permission denied']
is it the same "too long of a path" issue?
@yarikoptic What "too long of a path" issue? The only such issue I recall on macOS affected Conda's decisions about filling in shebangs.
argh, failed to find related discussion ATM. But you could try meanwhile setting TMPDIR=~/DLTMP
and see if goes away. That is what is done also in https://github.com/datalad/datalad/blob/master/.appveyor.yml#L228 and I believe for that reason
Underlying issue https://unix.stackexchange.com/questions/367008/why-is-socket-path-length-limited-to-a-hundred-chars#:~:text=Mac%20OS%20X%2010.9%3A%20104%20characters maximal socket path length 104 . In Datalad were use HOME in TMPDIR while testing
From error messages it seems like /Users/runner/DLTMP
is not mount-bound inside the docker container and thus leading to various issues? If there is /tmp
on those Macs, might be worth trying to export TMPDIR to e.g. /tmp/DLTMP
since /tmp
should exist in the container and more likely to work?
@yarikoptic #58 uses a third-party action for setting up Docker on macOS as an alternative to the Docker Machine approach on this branch. I'm not entirely certain how reliable the action in question is, and so I want to leave both PRs open for now.
blocker was resolved, master
of datalad should be green again, time to resolve this issue one way or another to gain better testing on OSX
@yarikoptic This PR seems to work now, aside from some datalad test failures.
@yarikoptic This PR seems to work now, aside from some datalad test failures.
well, it doesn't work in a sense that ssh related tests fail on macOS:
(git)smaug:/mnt/datasets/datalad/ci/git-annex/builds/2022/04[master]pr-55
$> git grep datalad.support.tests.test_annexrepo.test_annex_ssh
build-macos.yaml-645-32886238-failed/1_test-datalad (master).txt:2022-04-05T17:36:03.2718890Z datalad.support.tests.test_annexrepo.test_annex_ssh ... ERROR
build-macos.yaml-645-32886238-failed/1_test-datalad (master).txt:2022-04-05T17:54:00.1381870Z ERROR: datalad.support.tests.test_annexrepo.test_annex_ssh
build-macos.yaml-645-32886238-failed/2_test-datalad (maint).txt:2022-04-05T17:43:28.9826100Z datalad.support.tests.test_annexrepo.test_annex_ssh ... ERROR
build-macos.yaml-645-32886238-failed/3_test-datalad (release).txt:2022-04-05T17:47:29.4295650Z datalad.support.tests.test_annexrepo.test_annex_ssh ... ERROR
build-macos.yaml-645-32886238-failed/test-datalad (maint)/12_Run datalad tests.txt:2022-04-05T17:43:28.9826060Z datalad.support.tests.test_annexrepo.test_annex_ssh ... ERROR
build-macos.yaml-645-32886238-failed/test-datalad (master)/12_Run datalad tests.txt:2022-04-05T17:36:03.2718850Z datalad.support.tests.test_annexrepo.test_annex_ssh ... ERROR
build-macos.yaml-645-32886238-failed/test-datalad (master)/12_Run datalad tests.txt:2022-04-05T17:54:00.1381870Z ERROR: datalad.support.tests.test_annexrepo.test_annex_ssh
build-macos.yaml-645-32886238-failed/test-datalad (release)/12_Run datalad tests.txt:2022-04-05T17:47:29.4295600Z datalad.support.tests.test_annexrepo.test_annex_ssh ... ERROR
having said that:
$> git grep 'ssh.*ok\s*$' | grep macos | nl | tail
91 build-macos.yaml-645-32886238-failed/test-datalad (release)/12_Run datalad tests.txt:2022-04-05T18:02:31.9527840Z datalad.support.tests.test_sshconnector.test_ssh_custom_identity_file ... ok
92 build-macos.yaml-645-32886238-failed/test-datalad (release)/12_Run datalad tests.txt:2022-04-05T18:02:32.0505800Z datalad.support.tests.test_sshconnector.test_ssh_git_props ... ok
93 build-macos.yaml-645-32886238-failed/test-datalad (release)/12_Run datalad tests.txt:2022-04-05T18:02:32.1017030Z datalad.support.tests.test_sshconnector.test_ssh_get_connection ... ok
94 build-macos.yaml-645-32886238-failed/test-datalad (release)/12_Run datalad tests.txt:2022-04-05T18:02:32.2621720Z datalad.support.tests.test_sshconnector.test_ssh_manager_close_no_throw ... ok
95 build-macos.yaml-645-32886238-failed/test-datalad (release)/12_Run datalad tests.txt:2022-04-05T18:02:33.3163930Z datalad.support.tests.test_sshrun.test_no_stdin_swallow ... ok
96 build-macos.yaml-645-32886238-failed/test-datalad (release)/12_Run datalad tests.txt:2022-04-05T18:02:33.6052730Z datalad.support.tests.test_sshrun.test_ssh_ipv4_6 ... ok
97 build-macos.yaml-645-32886238-failed/test-datalad (release)/12_Run datalad tests.txt:2022-04-05T18:02:33.6402710Z datalad.support.tests.test_sshrun.test_ssh_ipv4_6_incompatible ... ok
98 build-macos.yaml-645-32886238-failed/test-datalad (release)/12_Run datalad tests.txt:2022-04-05T18:02:33.8369630Z datalad.tests.test_api.test_consistent_order_of_args(<class 'datalad.distribution.create_sibling.CreateSibling'>, {'sshurl'}) ... ok
99 build-macos.yaml-645-32886238-failed/test-datalad (release)/12_Run datalad tests.txt:2022-04-05T18:02:33.8470960Z datalad.tests.test_api.test_consistent_order_of_args(<class 'datalad.support.sshrun.SSHRun'>, {'login', 'cmd'}) ... ok
100 build-macos.yaml-645-32886238-failed/test-datalad (release)/12_Run datalad tests.txt:2022-04-05T18:05:02.3054330Z datalad.tests.test_tests_utils.test_skip_ssh ... ok
and it seems we are running into some TMPDIR binds related issue which we had encountered before? e.g.
2022-04-05T18:05:52.7696010Z datalad.runner.exception.CommandError: CommandError: 'ssh -o ControlPath=/private/tmp/DLTMP/datalad_temp_1ei94vxf/Library/Caches/datalad/sockets/a658d1c0 datalad-test 'export "PATH=/usr/lib/git-annex.linux:$PATH"; mkdir -p /private/tmp/DLTMP/datalad_temp_check_exists_interactivenpqq6gao/sibling'' failed with exitcode 1 [err: 'mkdir: cannot create directory ‘/private/tmp’: Permission denied']
@yarikoptic Regarding the TMPDIR issue, the problem seems to be that Datalad is trying to run an SSH command that runs mkdir -p /private/tmp/DLTMP/datalad_temp_check_exists_interactivenpqq6gao/sibling
on the remote host, where /private/tmp
is a macOS-specific path, but the SSH container is running Ubuntu.
@yarikoptic Regarding the TMPDIR issue, the problem seems to be that Datalad is trying to run an SSH command that runs
mkdir -p /private/tmp/DLTMP/datalad_temp_check_exists_interactivenpqq6gao/sibling
on the remote host, where/private/tmp
is a macOS-specific path, but the SSH container is running Ubuntu.
hm, I wondered how it works e.g. in mac tests in appveyor of stock datalad -- oh well, https://github.com/datalad/datalad/blob/master/.appveyor.yml#L274 , that is how
# we place the "unix" one into the user's HOME to avoid git-annex issues on MacOSX
# gh-5291
- sh: mkdir ~/DLTMP
# and use that scratch space to get short paths in test repos
# (avoiding length-limits as much as possible)
- cmd: "set TMP=C:\\DLTMP"
- cmd: "set TEMP=C:\\DLTMP"
- sh: export TMPDIR=~/DLTMP
so may be do the same here for OSX?
@yarikoptic This PR already sets TMPDIR=/private/tmp/DLTMP
. The problem is that DataLad is expecting the TMPDIR it its environment to be a valid TMPDIR in the environment that it's SSHing into.
@yarikoptic This PR already sets
TMPDIR=/private/tmp/DLTMP
. The problem is that DataLad is expecting the TMPDIR it its environment to be a valid TMPDIR in the environment that it's SSHing into.
rright, that is why as a workaround appveyor setup sets it to a path which should be present in both environments, i.e. ~/DLTMP
. In the long(er) run I guess it should sense the path to be used on the remote via remote mktemp
execution first I guess. Filed a dedicated https://github.com/datalad/datalad/issues/6622 for that. But since unlikely it to get into imminent 0.16.0, let's do a workaround for now?
@yarikoptic If the workaround you mean is to set TMPDIR
to ~/DLTMP
, that was tried previously; I suspect I had to change it because the path to the local $HOME does not exist inside the SSH container.
try exactly as ~/DLTMP
instead of using env var $HOME
and thus possibly expanding it into original path on the host machine!? may be magic exists and it would work somehow? ;)
@yarikoptic It appears that magic does not exist.
But it is interesting how it fails right in fixture here
File "/Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/datalad/__init__.py", line [26](https://github.com/datalad/git-annex/runs/5890452736?check_suite_focus=true#step:12:26)5, in setup_package
_, cfg_file = prep_tmphome()
File "/Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/datalad/__init__.py", line 242, in prep_tmphome
with make_tempfile(mkdir=True) as new_home:
File "/Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/contextlib.py", line 112, in __enter__
return next(self.gen)
File "/Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/site-packages/datalad/utils.py", line 1874, in make_tempfile
True: tempfile.mkdtemp}[mkdir](**tkwargs_)
File "/Users/runner/hostedtoolcache/Python/3.7.12/x64/lib/python3.7/tempfile.py", line 366, in mkdtemp
_os.mkdir(file, 0o700)
FileNotFoundError: [Errno 2] No such file or directory: '~/DLTMP/datalad_temp_8nl769bw'
and doesn't fail similarly in stock datalad somehow...
Blocker: https://github.com/datalad/datalad/pull/5417