allegroai / clearml-agent

ClearML Agent - ML-Ops made easy. ML-Ops scheduler & orchestration solution
https://clear.ml/docs/
Apache License 2.0
241 stars 92 forks source link

Using SSH cloning repository failed #148

Open DavidSonoda opened 1 year ago

DavidSonoda commented 1 year ago

Hi,

I was setting up the clearml-agent running in WSL ubuntu-22.04 in docker mode. When using ssh as the git authentication method. It gives me the following error.

Using SSH credentials - replacing https url 'https://github.com/BrainCoTech/clearml_machine_learning_pipline_example.git' with ssh url 'ssh://git@github.com/BrainCoTech/clearml_machine_learning_pipline_example.git'
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
Repository cloning failed: Command '['clone', 'ssh://git@github.com/BrainCoTech/clearml_machine_learning_pipline_example.git', '/root/.clearml/vcs-cache/clearml_machine_learning_pipline_example.git.dd435f5d7936ead3db2acd4cb49a4635/clearml_machine_learning_pipline_example.git', '--quiet', '--recursive']' returned non-zero exit status 128.
clearml_agent: ERROR: Failed cloning repository. 
1) Make sure you pushed the requested commit:
(repository='https://github.com/BrainCoTech/clearml_machine_learning_pipline_example.git', branch='master', commit_id='b96780d6c47ee6faf151b1eadc8f047432c1c106', tag='', docker_cmd='nvidia/cuda:10.2-cudnn7-runtime-ubuntu18.04', entry_point='tf_model_clearml_pipline_example.py', working_dir='.')
2) Check if remote-worker has valid credentials [see worker configuration file]

However, I did have the commit pushed and on the linux system, I set up the ssh key and tested pulling the repo without any issues.

Here's my command for running the clearml-agent in docker mode

clearml-agent daemon --queue default --docker nvcr.io/nvidia/tensorflow:23.02-tf2-py3

And here's part of the configuration that's related to git and ssh in clearml.conf

# Set GIT user/pass credentials
# leave blank for GIT SSH credentials
agent.git_user=""
agent.git_pass=""

# extra_index_url: ["https://allegroai.jfrog.io/clearml/api/pypi/public/simple"]
agent.package_manager.extra_index_url= [

]

agent {
    # unique name of this worker, if None, created based on hostname:process_id
    # Override with os environment: CLEARML_WORKER_ID
    # worker_id: "clearml-agent-machine1:gpu0"
    worker_id: ""

    # worker name, replaces the hostname when creating a unique name for this worker
    # Override with os environment: CLEARML_WORKER_NAME
    # worker_name: "clearml-agent-machine1"
    worker_name: ""

    # Set GIT user/pass credentials (if user/pass are set, GIT protocol will be set to https)
    # leave blank for GIT SSH credentials (set force_git_ssh_protocol=true to force SSH protocol)
    # **Notice**: GitHub personal token is equivalent to password, you can put it directly into `git_pass`
    # To learn how to generate git token GitHub/Bitbucket/GitLab:
    # https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token
    # https://support.atlassian.com/bitbucket-cloud/docs/app-passwords/
    # https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html
    # git_user: ""
    # git_pass: ""
    # Limit credentials to a single domain, for example: github.com,
    # all other domains will use public access (no user/pass). Default: always send user/pass for any VCS domain
    # git_host: ""

    # Force GIT protocol to use SSH regardless of the git url (Assumes GIT user/pass are blank)
    force_git_ssh_protocol: true
    # Force a specific SSH port when converting http to ssh links (the domain is kept the same)
    # force_git_ssh_port: 0
    # Force a specific SSH username when converting http to ssh links (the default username is 'git')
    # force_git_ssh_user: git
    # ...
}

The output of ls -la ~/.ssh

total 24
drwxr-xr-x 2 brainco brainco 4096 Mar 17 21:30 .
drwxr-x--- 8 brainco brainco 4096 Mar 17 18:24 ..
-rw------- 1 brainco brainco  419 Mar 17 21:30 id_ed25519
-rw-r--r-- 1 brainco brainco  106 Mar 17 21:30 id_ed25519.pub
-rw------- 1 brainco brainco  806 Mar 17 14:58 known_hosts
-rw-r--r-- 1 brainco brainco  142 Mar 17 14:58 known_hosts.old

The command that's adding the key to the ssh-agent

ssh-add ~/.ssh/id_ed25519

Any suggestions? Thanks in advance.

ainoam commented 1 year ago

@DavidSonoda Have you tried out SSH_AUTH_SOCK as in https://clear.ml/docs/latest/docs/clearml_agent#ssh-access?

BELONOVSKII commented 1 year ago

Ensure that you set an empty passphrase while generating ssh-key. This was an issue in my case.