NVIDIA / ngc-container-replicator

NGC Container Replicator
BSD 3-Clause "New" or "Revised" License
28 stars 12 forks source link

Ability to run without Docker? #20

Open lgorenstein opened 3 years ago

lgorenstein commented 3 years ago

I tried to convert the replicator into a Singularity image to be able to use it on a Docker-less cluster:

singularity pull docker://deepops/replicator:201015

This worked just fine and generated a replicator_201015.sif. Then off to replicating (note: needed PYTHONNOUSERSITE=1 otherwise stuff from ~/.local/lib/python3.6 was getting in the way... I might suggest defining this variable in the container proactively):

singularity run --env=PYTHONNOUSERSITE=1 -B /tmp:/output \
    replicator_201015.sif --project=nvidia --min-version=17.12 \
                       --image=tensorflow --image=pytorch --image=tensorrt \
                       --singularity \
                       --dry-run \
                       --api-key=`cat ~/.ngc_api_key.txt`

Unfortunately, the run crashes citing the lack of Docker daemon:

2021-03-24 11:35:39,056 - ngc_replicator.ngc_replicator - 30 - INFO - Initializing Replicator
2021-03-24 11:35:40,501 - nvidia_deepops.docker.registry.ngcregistry - 126 - INFO - GET https://api.ngc.nvidia.com/v2/orgs - took 0.5812202040106058 sec
WARNING! Using --password via the CLI is insecure. Use --password-stdin.
Warning: failed to get default registry endpoint from daemon (Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?). Using system default: https://index.docker.io/v1/
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Traceback (most recent call last):
  File "/usr/local/bin/ngc_replicator", line 33, in <module>
    sys.exit(load_entry_point('ngc-replicator==0.4.0', 'console_scripts', 'ngc_replicator')())
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/ngc_replicator-0.4.0-py3.6.egg/ngc_replicator/ngc_replicator.py", line 344, in main
    replicator = Replicator(**config)
  File "/usr/local/lib/python3.6/site-packages/ngc_replicator-0.4.0-py3.6.egg/ngc_replicator/ngc_replicator.py", line 39, in __init__
    self.nvcr_client.login(username="$oauthtoken", password=api_key, registry="nvcr.io/v2")
  File "/usr/local/lib/python3.6/site-packages/nvidia_deepops-0.4.2-py3.6.egg/nvidia_deepops/docker/client/dockercli.py", line 62, in login
    "docker login -u {} -p {} {}".format(username, password, registry))
  File "/usr/local/lib/python3.6/site-packages/nvidia_deepops-0.4.2-py3.6.egg/nvidia_deepops/docker/client/dockercli.py", line 58, in call
    stderr=stderr)
  File "/usr/local/lib/python3.6/subprocess.py", line 311, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['docker', 'login', '-u', '$oauthtoken', '-p', '<_my_API_key_here_', 'nvcr.io/v2']' returned non-zero exit status 1.

From a naive user prospective, if I run from singularity (i.e. outside of Docker ecosystem) and all I want is to dump a bunch of image files, I should not be needing a functional Docker daemon on the host, right? Would it be possible for the replicator to detect such condition?

ajdecon commented 3 years ago

Our actual implementation for downloading containers from NGC makes use of the Docker SDK, which makes use of a connection to the host's docker daemon. So this workflow does rely on a functional Docker daemon.

We'd be open to removing this dependency, but we don't have any plans to work on this right now.

lgorenstein commented 3 years ago

Thank you Adam, looking forward to this! I mean, even a '--no-docker-daemon' plug would work for my use case :)

blairjj commented 3 years ago

This is a show stopper for me at my company. Looks like we are going to have create our own tool. Sad no one is addressing this.