datalad / git-annex

A non-official clone of git-annex established for DataLad purposes. No PRs will be merged, but could be used to test perspective git-annex patches. Official git-annex repository: https://git.kitenet.net/index.cgi/git-annex.git/
14 stars 3 forks source link

OSX: ssh docker setup is not working for maint and master #42

Open yarikoptic opened 3 years ago

yarikoptic commented 3 years ago

see e.g. https://github.com/datalad/git-annex/actions/runs/458535490 runs

2021-01-03T02:45:26.5758460Z 57a3a5a52691: Pull complete
2021-01-03T02:45:26.5818610Z Digest: sha256:f5e151dc378ce081e3009e0780d96ba96bd003be07f7da8be626ecce5511e0f1
2021-01-03T02:45:26.5837070Z Status: Downloaded newer image for dataladtester/docker-ssh-target:latest
2021-01-03T02:45:26.5843510Z  ---> eff5a230c1b6
2021-01-03T02:45:26.5848370Z Step 2/4 : RUN groupadd -og 20 dl &&     useradd -ms /bin/bash -ou 501 -g dl dl &&     mkdir -p /home/dl/.ssh &&     chown -R dl:dl /home/dl/ &&     echo 'dl:dl' | chpasswd
2021-01-03T02:45:26.7088880Z  ---> Running in c227b4f3181b
2021-01-03T02:45:27.1036730Z Removing intermediate container c227b4f3181b
2021-01-03T02:45:27.1042410Z  ---> 8c8f201d0430
2021-01-03T02:45:27.1046170Z Step 3/4 : CMD ["/usr/sbin/sshd", "-D"]
2021-01-03T02:45:27.1292980Z  ---> Running in dd6112c08e62
2021-01-03T02:45:27.1851620Z Removing intermediate container dd6112c08e62
2021-01-03T02:45:27.1855330Z  ---> 47602e520b19
2021-01-03T02:45:27.1856340Z Step 4/4 : RUN mkdir -p "/private/var/folders/24/8k48jl6d249_n_qfxwsl6xvm0000gn/T"
2021-01-03T02:45:27.2110890Z  ---> Running in 5c1475e5bade
2021-01-03T02:45:27.4972770Z Removing intermediate container 5c1475e5bade
2021-01-03T02:45:27.4976210Z  ---> ea7b055ec3e6
2021-01-03T02:45:27.4983950Z Successfully built ea7b055ec3e6
2021-01-03T02:45:27.5029840Z Successfully tagged datalad-tests-ssh:latest
2021-01-03T02:45:27.5756460Z ac072058773fb66ab0dec91d80af3fba86b5293374d92a49169f3de39fdec683
2021-01-03T02:45:27.9241170Z cfe67f0b48989a2f2d007dbf66614dd6ca7c1590be5c48bfb172bb767cd90f3c
2021-01-03T02:45:28.2400480Z nc: connectx to localhost port 42241 (tcp) failed: Connection refused
2021-01-03T02:45:28.2402450Z nc: connectx to localhost port 42241 (tcp) failed: Connection refused
2021-01-03T02:45:29.4298790Z nc: connectx to localhost port 42241 (tcp) failed: Connection refused
2021-01-03T02:45:29.4300300Z nc: connectx to localhost port 42241 (tcp) failed: Connection refused
.... the same is filling up the logs .... 

did not look inside on how to resolve but must be possible one way or another (may be it is just a port conflict issue among multiple docker instances on the same box?)

yarikoptic commented 3 years ago

any immediate ideas on what is going wrong here? given that master soon will be released as 0.14.0 and thus maint jump over to current master, may be this issue would disappear on its own though

jwodder commented 3 years ago

@yarikoptic I do not know what's going wrong. Further ad hoc customization of the SSH setup scripts would be needed in order to get any debugging information.

yarikoptic commented 3 years ago

Ah, let's then forget about it and wait for master release

yarikoptic commented 3 years ago

actually I take it back since I mixed it all up -- it works only on release and not on maint or master, so we are doomed to pin it down :-/ FWIW it seems failing differently ATM

maint: ```shell ==> docker-machine Bash completion has been installed to: /usr/local/etc/bash_completion.d To have launchd start docker-machine now and restart at login: brew services start docker-machine Or, if you don't want/need a background service you can just run: docker-machine start Creating CA: /Users/runner/.docker/machine/certs/ca.pem Creating client certificate: /Users/runner/.docker/machine/certs/cert.pem Running pre-create checks... (default) Image cache directory does not exist, creating it at /Users/runner/.docker/machine/cache... (default) No default Boot2Docker ISO found locally, downloading the latest release... Error with pre-create check: "failure getting a version tag from the Github API response (are you getting rate limited by Github?)" Error: Process completed with exit code 3. ```
master: connection refused ```shell END datalad-tests-ssh2 LOGS -------- nc: connectx to localhost port 42241 (tcp) failed: Connection refused nc: connectx to localhost port 42241 (tcp) failed: Connection refused nc: connectx to localhost port 42241 (tcp) failed: Connection refused ```

so I guess it boils down to how datalad is installed (from pypi vs straight from git)?

jwodder commented 3 years ago

@yarikoptic Those errors are occurring before datalad is even installed. We've also known about the maint issue for a while, and it continues to fail despite doing what the docker-machine docs say.

yarikoptic commented 3 years ago

d'oh -- looked into our template:

    {% if ostype == "ubuntu" or ostype == "macos" %}
      - name: Set up SSH target
        shell: bash
        # TODO: Drop the release condition once 0.13.2 is released.
        run: |
          if [ "${{ matrix.version }}" != "release" ]; then
            {% if ostype == "macos" %}

that explains the difference between released or not. On released (which we should have started to test against SSH) we do not even bother to set it up for running SSH tests. So, at least that mystery is not a mystery ;) I will submit a PR now to just disable setting it up for SSH on OSX, so we get green again, but we still need to figure out WTF we fail to establish that docker container on OSX.

jwodder commented 3 years ago

@yarikoptic I believe I've finally fixed this. The problem was that SSH was configured to connect to localhost, but when using docker-machine, containers' ports aren't exposed on localhost, they're exposed on the IP address for the docker-machine VM.

PRs: https://github.com/datalad/datalad/pull/5417, https://github.com/datalad/git-annex/pull/55

yarikoptic commented 3 years ago

AWESOME, Thank you @jwodder !

yarikoptic commented 2 years ago

Eventually we should get back to this , and either finish #55 or #58 but currently testing against datalad is still red overall since recent annex changes caused breakages, see https://github.com/datalad/datalad/issues/6492 -- so we are pretty much blocked by that. We should get back to adding ssh testing as soon as datalad turns green again here.