dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.64k stars 1.47k forks source link

Better config for jobs with sidecar containers and DynamicOut #25308

Open michaelromagne opened 3 weeks ago

michaelromagne commented 3 weeks ago

What's the use case?

I have a job with a dynamic graph, using DynamicOut.

The ops are configured with Pydantic configs, allowing us to parametrize the ops in the launchpad.

In each Op, we have a sidecar container running alongside the main step container.

This sidecar container will run a ML model sever, to compute insights on some data. We are interested in changing the image used in the sidecar container in the launchpad before running a job, using the pydantic config.

Here is an example of the config with only 2 ops in parallel (with the dynamic graph). As you can see, the configuration is currently not convenient as we have to replicate the per_step_k8s_config element for each op in our graph:

execution:
  config:
    per_step_k8s_config:
      compute_ml_enhancements[chunk_0]:
        pod_spec_config:
          init_containers:
          - image: <ml_model_server_image>
            name: model-server
            ports:
            - container_port: 3000
              host_port: 3000
              protocol: TCP
            resources:
              requests:
                cpu: '3'
                memory: 3Gi
            restartPolicy: Always
      compute_ml_enhancements[chunk_1]:
        pod_spec_config:
          init_containers:
          - image: <model_server_image>
            name: model-server
            ports:
            - container_port: 3000
              host_port: 3000
              protocol: TCP
            resources:
              requests:
                cpu: '3'
                memory: 3Gi
            restartPolicy: Always

ops:
  fetch_and_chunk_data:
    config:
      account_ids:
      - <account_id>
      number_of_model_servers: 2
      number_of_rows: 100
resources:
  io_manager:
    config:
      s3_bucket: <s3>

I can use yaml anchors to shorten this, but in the end it will be unwrapped in the UI config + when I want to relaunch a job based on a past run.

Ideas of implementation

I expect to only give the dynamic op config once, like below, and the same k8s config is used for each op :

execution:
  config:
    per_step_k8s_config:
      compute_ml_enhancements:
        pod_spec_config:
          init_containers:
          - image: <model_server_image>
            name: model-server
            ports:
            - container_port: 3000
              host_port: 3000
              protocol: TCP
            resources:
              requests:
                cpu: '3'
                memory: 3Gi
            restartPolicy: Always

ops:
  fetch_and_chunk_data:
    config:
      account_ids:
      - <account_id>
      number_of_model_servers: 2
      number_of_rows: 100
resources:
  io_manager:
    config:
      s3_bucket: <s3>

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

michaelromagne commented 2 weeks ago

We switched to K8SRunLauncher, that allows us to have the same sidecar container for all pods. This is still not ideal because we would like to have the sidecar only on a few ops but it's better than above problem.