PrefectHQ / prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
https://prefect.io
Apache License 2.0
16.99k stars 1.62k forks source link

Better documention for work pool 'base-job-template' #12201

Open NodeJSmith opened 8 months ago

NodeJSmith commented 8 months ago

First check

Describe the issue

Attempting to create a base job template to use with the cli prefect work-pool create command is not difficult, as there is no documentation for this, only for using .deploy and prefect.yml.

This is for a Docker work pool, for context.

Based on the schema in a newly created work pool in the UI I tried creating a file for my base job template like this

{
  "variables": { "image_pull_policy": "Never", "auto_remove": true },
  "job_configuration": {
    "image_pull_policy": "{{ image_pull_policy }}",
    "auto_remove": "{{ auto_remove }}"
  }
}

But this fails with

Response: {'exception_message': 'Invalid request received.', 'exception_detail': [{'loc': ['body', 'base_job_template'], 'msg': 'The variables specified in the job configuration template must be present as properties in the variables schema. Your job configuration uses the following undeclared variable(s): auto_remove ,image_pull_policy.', 'type': 'value_error'}], 'request_body': {'name': 'test', 'type': 'docker', 'base_job_template': {'variables': {'image_pull_policy': 'Never', 'auto_remove': True}, 'job_configuration': {'image_pull_policy': '{{ image_pull_policy }}', 'auto_remove': '{{ auto_remove }}'}}, 'is_paused': False}}

I eventually realized it needs to be declared with {{ variables.auto_remove }}, for example.

{
  "variables": { "image_pull_policy": "Never", "auto_remove": true },
  "job_configuration": {
    "image_pull_policy": "{{ variables.image_pull_policy }}",
    "auto_remove": "{{ variables.auto_remove }}"
  }
}

And this works but shows this in the UI when you try to edit it:

image

So the actual way to do this is to copy the entire default from the UI Advanced tab and then add defaults to the schema definitions, which is not clear or, in my opinion, a very clean way to do this. Perhaps there's a simpler way that I'm missing.

Describe the proposed change

I would recommend detailing that the way to set defaults on a work pool using the --base-job-template argument of the CLI be included in the documentation, explaining that the best way to do this is to copy the values from the Advanced tab of the UI into a file and then adding defaults where desired in the schema definition.

Although honestly I think the better option would be to find a better way to set defaults, especially because I have a feeling this method is going to override any default schema changes made in later versions of Prefect, and that won't be obvious until there is a desired feature/value missing from that schema.

Additional context

No response

zzstoatzz commented 8 months ago

hi @NodeJSmith - thank you for the issue!

I agree that the ideas behind the base job template could use some more explanation in the docs - we'll add this to the docs backlog

NodeJSmith commented 8 months ago

I am trying to switch to using the Python client instead of the cli - I've finally gotten defaults working in the work pool create client call, but it is a lot of code, a lot more than I'd expected. Is the below the right way to do this, or am I missing something simpler?


override_kwargs = {"image": "test-image:latest", "image_pull_policy": "Never", "auto_remove": True}

job_config = DockerWorkerJobConfiguration()

job_config_schema = json.loads(job_config.schema_json())
job_config_schema.pop("title")

for k, v in override_kwargs.items():
    job_config_schema["properties"][k]["default"] = v

job_config_schema = {"variables": job_config_schema}

job_config_schema["job_configuration"] = job_config.json_template()
wp_create = WorkPoolCreate(
    name="test_pool",
    description="Work pool for test",
    type="docker",
    base_job_template=job_config_schema,
)

wp_update = WorkPoolUpdate(base_job_template=job_config_schema)
Samreay commented 2 months ago

+1 to improving the documentation on this section.

I've so far spent several days just trying to set my ECS worker to have default values for the vpc_id and other infrastructure parameters, and even with the help of Nate (https://prefect-community.slack.com/archives/CL09KU1K7/p1724127210012189) and Bianca Hoch (https://prefect-community.slack.com/archives/CL09KU1K7/p1723717205484319) we still haven't got anything working properly. The only solution we have which partially works right now is a literal string of sed commands on the base template:

image