dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.97k stars 1.5k forks source link

"code-server" feature: Reload not working without PVC? #18925

Open jomeier opened 11 months ago

jomeier commented 11 months ago

Dagster version

1.5.11

What's the issue?

Hi, I use the "code-server" experimental feature on OpenShift/Kubernetes. My use case is, that I want to change code in the user code container on Kubernetes and "hot reload" it in Dagsters Web UI, without rebuilding the user code image. This works so far. Why I want to do that? Because later I want to connect to the running container with Visual Studio Code and comfortably change the code during experimentation phase.

I use the default user-code-example Docker image from DockerHub and point to the (empty) file: /example_project/example_repo/init.py

Dagster complains first, that init.py is empty and that it doesn't find any job, asset, ...

If I add this code here to init.py afterwards in the running user code container:

import json
import requests

from dagster import Definitions, define_asset_job, asset
from dagster_k8s import k8s_job_executor

@asset
def hackernews_top_story_ids():
    """Get top stories from the HackerNews top stories endpoint.

    API Docs: https://github.com/HackerNews/API#new-top-and-best-stories.
    """
    top_story_ids = requests.get(
        "https://hacker-news.firebaseio.com/v0/topstories.json"
    ).json()

    with open("hackernews_top_story_ids.json", "w") as f:
        json.dump(top_story_ids[:10], f)

my_job = define_asset_job(name="my_job", selection=[hackernews_top_story_ids], executor_def=k8s_job_executor)

defs = Definitions(
    assets=[hackernews_top_story_ids],
    jobs=[my_job],
)

... reload the code location in Dagsters UI and materialize the Job, Dagster complains with this error message:

Could not load job definition.
dagster._core.errors.DagsterInvariantViolationError: defs not found at module scope in file /example_project/example_repo/__init__.py.
  File "/usr/local/lib/python3.10/site-packages/dagster/_grpc/impl.py", line 121, in core_execute_run
    recon_job.get_definition()
  File "/usr/local/lib/python3.10/site-packages/dagster/_core/definitions/reconstruct.py", line 261, in get_definition
    return self.repository.get_definition().get_maybe_subset_job_def(
  File "/usr/local/lib/python3.10/site-packages/dagster/_core/definitions/reconstruct.py", line 119, in get_definition
    return repository_def_from_pointer(self.pointer, self.repository_load_data)
  File "/usr/local/lib/python3.10/site-packages/dagster/_core/definitions/reconstruct.py", line 741, in repository_def_from_pointer
    target = def_from_pointer(pointer)
  File "/usr/local/lib/python3.10/site-packages/dagster/_core/definitions/reconstruct.py", line 635, in def_from_pointer
    target = pointer.load_target()
  File "/usr/local/lib/python3.10/site-packages/dagster/_core/code_pointer.py", line 176, in load_target
    return _load_target_from_module(
  File "/usr/local/lib/python3.10/site-packages/dagster/_core/code_pointer.py", line 200, in _load_target_from_module
    raise DagsterInvariantViolationError(f"{fn_name} not found {error_suffix}")

It seems as if Dagster does not "hot reload" the Definitions.

What did you expect to happen?

No errors.

How to reproduce?

Described above.

Deployment type

Dagster Helm chart

Deployment details

values.yaml of my Helm Chart:

    global:
      serviceAccountName: default

    postgresql:
      volumePermissions:
        enabled: false
        securityContext:
          runAsUser: "auto"
      securityContext:
        enabled: false
      shmVolume:
        chmod:
          enabled: false

    dagsterWebserver:
      resources:
        requests:
          memory: 512Mi
          cpu: 250m      
        limits:
          memory: 512Mi
          cpu: 250m

    dagsterDaemon:
      resources:
        requests:
          memory: 256Mi
          cpu: 250m      
        limits:
          memory: 256Mi
          cpu: 250m
      securityContext:
        runAsUser: 0          

    postgresql:
      resources:
        requests:
          memory: 256Mi
          cpu: 250m      
        limits:
          memory: 256Mi
          cpu: 250m

    dagster-user-deployments:
      enabled: true
      deployments:
        - name: "user-code-1"
          image:
            repository: "docker.io/dagster/user-code-example"
            tag: 1.5.11  #1.3.7
            pullPolicy: Always
          codeServerArgs:
            - "--python-file"
            - "/example_project/example_repo/__init__.py"
          port: 3030
          envSecrets:
            - name: dagster-aws-access-key-id
            - name: dagster-aws-secret-access-key
          securityContext:
            runAsUser: 0
          podSecurityContext:
            runAsUser: 0
          resources:
            requests:
              memory: 1Gi
              cpu: 1      
            limits:
              memory: 1Gi
              cpu: 1               

    runLauncher:
      type: K8sRunLauncher
      config:
        k8sRunLauncher:
          envSecrets:
            - name: dagster-aws-access-key-id
            - name: dagster-aws-secret-access-key
          jobNamespace: "dagster"
          runK8sConfig:
            containerConfig: # raw config for the pod's main container
              resources:
                limits:
                  cpu: 500m
                  memory: 512Mi
                requests:
                  cpu: 500m
                  memory: 512Mi                            
  releaseName: dagster
  version: 1.5.11 
  repo: https://dagster-io.github.io/helm

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

dpeng817 commented 11 months ago

Just to rule it out, if you do actually re-initialize the code server, does it then work correctly?

jomeier commented 11 months ago

No, it doesn't.

Re-initialization of the code server reloads anything (assets, jobs, ...) but it always complains about the instanciated variable "not found" ...

The User Code Container Must be started with an initialized "defs = Definitions(...)" in the target python_file, then further "reloaded" Updates seem to work.

jomeier commented 9 months ago

@dpeng817 I found out that for some reason, if I mount a PVC (Kuberntes Persistent Volume Claim) into the Dagster user code location Pod in the Helm Chart, everything seems to work as expected. I can change code and even if initially there is no code, I can add code and after a reload in the UI everything works.

Without the PVC reload does not work.

That does not make any sense for me :)