the volume name git-repo is NOT scoped to the stack name, in other words, if using docker stack deploy -c stack.yaml mystack to deploy, a volume is going to be called mystack_git-repo and will not match the name above as dagster does not seem to be stack aware. The solution is to declare the volume with a name: git-repo attribute in the stack file so that it matches the name above. No error is raised anywhere because docker will create a volume if it does not exist, so you end up with 2 volumes: mystack_git-repo and git-repo and wonder why your files are not there :slightly_smiling_face: The same problem happens with networks in dagster.yaml
U022ANVL9BJ: Hi all! I'm deploying dagit/dagster on docker and I've started getting permission errors when the scheduler starts runs, apparently because my user code image is in a private registry. On the docker host, I'm able to docker pull with no problem, so is there some extra config I need to pass to dagit or the daemon so they can access the container registry? (I'm saying "I've started..." because it only appeared since I upgraded to 0.14.3 from 0.13.19 but my config was pretty messy, so it might have been hidden behind other problems)
docker.errors.APIError: 500 Server Error for <http+docker://localhost/v1.41/images/create?tag=49a20c1d&fromImage=registry.gitlab.com%2F[myrepo]%2Fdagster_user_code>: Internal Server Error ("Head "<https://registry.gitlab.com/v2/[myrepo]/dagster_user_code/manifests/49a20c1d>": denied: access forbidden")
File "/usr/local/lib/python3.9/site-packages/dagster/core/instance/__init__.py", line 1575, in launch_run
self._run_launcher.launch_run(LaunchRunContext(pipeline_run=run, workspace=workspace))
File "/usr/local/lib/python3.9/site-packages/dagster_docker/docker_run_launcher.py", line 149, in launch_run
self._launch_container_with_command(run, docker_image, command)
File "/usr/local/lib/python3.9/site-packages/dagster_docker/docker_run_launcher.py", line 107, in _launch_container_with_command
client.images.pull(docker_image)
File "/usr/local/lib/python3.9/site-packages/docker/models/images.py", line 444, in pull
pull_log = self.client.api.pull(
File "/usr/local/lib/python3.9/site-packages/docker/api/image.py", line 428, in pull
self._raise_for_status(response)
File "/usr/local/lib/python3.9/site-packages/docker/api/client.py", line 270, in _raise_for_status
raise create_api_error_from_http_exception(e)
File "/usr/local/lib/python3.9/site-packages/docker/errors.py", line 31, in create_api_error_from_http_exception
raise cls(e, response=response, explanation=explanation)
U016C4E5CP8: Hi - I'm not aware of any changes to the pull behavior between those versions. Would you mind posting the full 'docker pull' command that's working?
If you want to simulate what the docker launch is doing you could run the following in a python script - I'd expect that to also fail if dagster is failing to pull the image:
U022ANVL9BJ: Hi Daniel, thanks for replying so quickly! Which container should I run this from? the dagster_daemon one?
U016C4E5CP8: Yeah, this would be in the daemon
U016C4E5CP8: one thing is to make sure you have permissions for docker in that container (our examples do this by mounting the docker socket as a volume: https://docs.dagster.io/deployment/guides/docker#launching-runs-in-containers)
U022ANVL9BJ: yes, I saw this just before asking the question here. It was fine already.
U022ANVL9BJ: Sorry for the lag. Indeed, it does fail with the same error message
U016C4E5CP8: Got it - you may need to check what exactly the gitlab requirements are for authentication. The launcher does have a registry config param that you can use if you also need to supply a username and password somewhere
U022ANVL9BJ: in addition, I ran you script in the dagster_daemon container, then docker pull <the same image> on the host (-> "downloaded newer image...") and then your script again, but same error
U022ANVL9BJ: ah ok, sweet, let me give that a shot
U022ANVL9BJ: I've done the following in my dagster.yaml under the run_launcher config:
Anything obviously wrong? the env var at the bottom is correctly loaded in the container, I've checked through os.environ in python, but still no luck
U016C4E5CP8: Nothing there looks obviously wrong - it's looking like this may be more of a gitlab / docker question given that the script above didn't work either (not to pass the buck - I'm just not sure what exactly gitlab requires in order for you to be able to pull their images)
U022ANVL9BJ: Coming back to this again, I managed to get the python script to login to gitlab from the docker image (by passing the credentials manually), so I know the values are correct, but I guess dagster is not seeing what I think it is. Is there an easy way to get a debug view of the config as it was loaded? Worst case I can patch the code in the containers to log some debug statements, but it feels a bit overkill...
U016C4E5CP8: Would you mind posting the updated python script that works (without the actual password of course)?
U022ANVL9BJ:
I ran this inside the dagster-daemon container
U022ANVL9BJ: in the meantime, I actually patched the code in docker_run_launcher.py to log self.registry inside DockerRunLauncher.__get__client and it is None so basically, dagster does not try to login to docker, which would explain why the pull fails
U022ANVL9BJ: in dagster.yaml I have the following:
did I miss something obvious?
U016C4E5CP8: Are you setting DAGSTER_CONT_REGISTRY_DEPLOY_TOKEN in your docker compose file?
U022ANVL9BJ: yes, I checked inside the container in the same python process as above, it appears in os.environ, I guess that's enough?
U022ANVL9BJ: side note, I just noticed that in /dagit/instance/config the config under run_launcher does not include some of the keys above, like registry, is this by design, or is something wrong there?
U016C4E5CP8: That's not by design and is likely related to the problem - are you sure that the changes you are making to dagster.yaml are making it into the container?
U022ANVL9BJ: ok, so the running containers (dagit and dagster-daemon) have the correct version of the file, but dagit is showing an old version which matches a git commit 2 days old. This is very weird, considering that the containers are rebuilt/replaced on each deployment (with docker stack deploy)
U016C4E5CP8: I don't have a great explanation for that - dagit doesn't cache or persist its dagster.yaml file or anything like that, it reads it directly from your DAGSTER_HOME folder. My suspicion is that something must be getting incorrectly cached in your docker setup or not being rebuilt on each deploy
U022ANVL9BJ: I'm afraid you are right. I found a second version of dagster.yaml in my images, which somehow is stuck at an old git version. Sorry for wasting your time with all this, I'll keep digging by myself. I think I know a lot more about how the config needs to be done now, so hopefully once I've debugged my docker problem, I'll just sail through the rest :)
U016C4E5CP8: no prob!
U022ANVL9BJ: Eventually got it to work. It was a combination of problems, but a key one is that when specifying a volume with code like this and using docker swarm
the volume name git-repo is NOT scoped to the stack name, in other words, if using docker stack deploy -c stack.yaml mystack to deploy, a volume is going to be called mystack_git-repo and will not match the name above as dagster does not seem to be stack aware. The solution is to declare the volume with a name: git-repo attribute in the stack file so that it matches the name above. No error is raised anywhere because docker will create a volume if it does not exist, so you end up with 2 volumes: mystack_git-repo and git-repo and wonder why your files are not there :slightly_smiling_face: The same problem happens with networks in dagster.yaml
Maybe worth pointing out in the docs under https://docs.dagster.io/deployment/guides/docker ? (adding a section about deploying to docker swarm might help?)
U016C4E5CP8: <@U018K0G2Y85> docs Document deploying Dagster on Docker swarm
U018K0G2Y85: Created issue at: https://github.com/dagster-io/dagster/issues/7047
Message from the maintainers:
Are you looking for the same documentation content? Give it a :thumbsup:. We factor engagement into prioritization.
Summary
Add a section to the Docker guide that details how to deploy Dagster on Docker swarm. Specifically, this part, taken from the below Slack convo:
Eventually got it to work. It was a combination of problems, but a key one is that when specifying a volume with code like this and using docker swarm
the volume name
git-repo
is NOT scoped to the stack name, in other words, if usingdocker stack deploy -c stack.yaml mystack
to deploy, a volume is going to be calledmystack_git-repo
and will not match the name above as dagster does not seem to be stack aware. The solution is to declare the volume with aname: git-repo
attribute in the stack file so that it matches the name above. No error is raised anywhere because docker will create a volume if it does not exist, so you end up with 2 volumes:mystack_git-repo
andgit-repo
and wonder why your files are not there :slightly_smiling_face: The same problem happens with networks indagster.yaml
Dagster Documentation Gap
This issue was generated from the slack conversation at: https://dagster.slack.com/archives/C01U954MEER/p1646944483749229?thread_ts=1646944483.749229&cid=C01U954MEER
Conversation excerpt
U022ANVL9BJ: Hi all! I'm deploying dagit/dagster on docker and I've started getting permission errors when the scheduler starts runs, apparently because my user code image is in a private registry. On the docker host, I'm able to
docker pull
with no problem, so is there some extra config I need to pass to dagit or the daemon so they can access the container registry? (I'm saying "I've started..." because it only appeared since I upgraded to 0.14.3 from 0.13.19 but my config was pretty messy, so it might have been hidden behind other problems)U016C4E5CP8: Hi - I'm not aware of any changes to the pull behavior between those versions. Would you mind posting the full 'docker pull' command that's working?
If you want to simulate what the docker launch is doing you could run the following in a python script - I'd expect that to also fail if dagster is failing to pull the image:
U022ANVL9BJ: Hi Daniel, thanks for replying so quickly! Which container should I run this from? the dagster_daemon one? U016C4E5CP8: Yeah, this would be in the daemon U016C4E5CP8: one thing is to make sure you have permissions for docker in that container (our examples do this by mounting the docker socket as a volume: https://docs.dagster.io/deployment/guides/docker#launching-runs-in-containers) U022ANVL9BJ: yes, I saw this just before asking the question here. It was fine already. U022ANVL9BJ: Sorry for the lag. Indeed, it does fail with the same error message U016C4E5CP8: Got it - you may need to check what exactly the gitlab requirements are for authentication. The launcher does have a registry config param that you can use if you also need to supply a username and password somewhere U022ANVL9BJ: in addition, I ran you script in the
dagster_daemon
container, thendocker pull <the same image>
on the host (-> "downloaded newer image...") and then your script again, but same error U022ANVL9BJ: ah ok, sweet, let me give that a shot U022ANVL9BJ: I've done the following in mydagster.yaml
under the run_launcher config:Anything obviously wrong? the env var at the bottom is correctly loaded in the container, I've checked through
os.environ
in python, but still no luck U016C4E5CP8: Nothing there looks obviously wrong - it's looking like this may be more of a gitlab / docker question given that the script above didn't work either (not to pass the buck - I'm just not sure what exactly gitlab requires in order for you to be able to pull their images) U022ANVL9BJ: Coming back to this again, I managed to get the python script to login to gitlab from the docker image (by passing the credentials manually), so I know the values are correct, but I guess dagster is not seeing what I think it is. Is there an easy way to get a debug view of the config as it was loaded? Worst case I can patch the code in the containers to log some debug statements, but it feels a bit overkill... U016C4E5CP8: Would you mind posting the updated python script that works (without the actual password of course)? U022ANVL9BJ:I ran this inside the dagster-daemon container U022ANVL9BJ: in the meantime, I actually patched the code in docker_run_launcher.py to log
self.registry
insideDockerRunLauncher.__get__client
and it isNone
so basically, dagster does not try to login to docker, which would explain why the pull fails U022ANVL9BJ: indagster.yaml
I have the following:did I miss something obvious? U016C4E5CP8: Are you setting DAGSTER_CONT_REGISTRY_DEPLOY_TOKEN in your docker compose file? U022ANVL9BJ: yes, I checked inside the container in the same python process as above, it appears in
os.environ
, I guess that's enough? U022ANVL9BJ: side note, I just noticed that in/dagit/instance/config
the config underrun_launcher
does not include some of the keys above, likeregistry
, is this by design, or is something wrong there? U016C4E5CP8: That's not by design and is likely related to the problem - are you sure that the changes you are making to dagster.yaml are making it into the container? U022ANVL9BJ: ok, so the running containers (dagit and dagster-daemon) have the correct version of the file, but dagit is showing an old version which matches a git commit 2 days old. This is very weird, considering that the containers are rebuilt/replaced on each deployment (withdocker stack deploy
) U016C4E5CP8: I don't have a great explanation for that - dagit doesn't cache or persist its dagster.yaml file or anything like that, it reads it directly from your DAGSTER_HOME folder. My suspicion is that something must be getting incorrectly cached in your docker setup or not being rebuilt on each deploy U022ANVL9BJ: I'm afraid you are right. I found a second version of dagster.yaml in my images, which somehow is stuck at an old git version. Sorry for wasting your time with all this, I'll keep digging by myself. I think I know a lot more about how the config needs to be done now, so hopefully once I've debugged my docker problem, I'll just sail through the rest :) U016C4E5CP8: no prob! U022ANVL9BJ: Eventually got it to work. It was a combination of problems, but a key one is that when specifying a volume with code like this and using docker swarmthe volume name
git-repo
is NOT scoped to the stack name, in other words, if usingdocker stack deploy -c stack.yaml mystack
to deploy, a volume is going to be calledmystack_git-repo
and will not match the name above as dagster does not seem to be stack aware. The solution is to declare the volume with aname: git-repo
attribute in the stack file so that it matches the name above. No error is raised anywhere because docker will create a volume if it does not exist, so you end up with 2 volumes:mystack_git-repo
andgit-repo
and wonder why your files are not there :slightly_smiling_face: The same problem happens with networks indagster.yaml
Maybe worth pointing out in the docs under https://docs.dagster.io/deployment/guides/docker ? (adding a section about deploying to docker swarm might help?) U016C4E5CP8: <@U018K0G2Y85> docs Document deploying Dagster on Docker swarm U018K0G2Y85: Created issue at: https://github.com/dagster-io/dagster/issues/7047Message from the maintainers:
Are you looking for the same documentation content? Give it a :thumbsup:. We factor engagement into prioritization.