databricks / cli

Databricks CLI
Other
115 stars 39 forks source link

[FEATURE] Support arrays for job parameters #1508

Closed stevenayers-bge closed 9 hours ago

stevenayers-bge commented 1 week ago

Describe the issue

Currently, there is a mismatch between task parameters and job parameters support for JSON arrays. There are some situations where this becomes problematic.

Say you have the following python task:

# fruits.py
from argparse import ArgumentParser

if __name__ == '__main__':
    parser = ArgumentParser()
    parser.add_argument('--fruits', dest="fruits", nargs='+', type=str)
    args = parser.parse_args()

    print(f"Congrats! You've got {len(args.fruits)} fruits!")
    for fruit in args.fruits:
        print(fruit)

You'd run it like this:

$ ./fruits.py --fruits peach banana cherry
Congrats! You've got 3 fruits!
peach
banana
cherry

You move it to a workflow

resources:
  jobs:
    analyze_fruits:
      name: Nutritious AND Delicious!
      tasks:
        - task_key: juicy_fruity
          spark_python_task:
            python_file: fruits.py
            source: GIT
            parameters:
              - --fruits
              - peach
              - banana
              - cherry

This works correctly and interprets args.fruits as a list:

# Output
Congrats! You've got 3 fruits!
peach
banana
cherry

But if you want to pass the task some job parameters, this happens:

resources:
  jobs:
    analyze_fruits:
      name: Nutritious AND Delicious!
      parameters:
        - name: fruits
          default: peach banana cherry # no option for a list
      tasks:
        - task_key: juicy_fruity
          spark_python_task:
            python_file: fruits.py
            source: GIT
            parameters:
              - --fruits
              - {{job.parameters.fruits}}

The job parameter is interpreted as one string:

# Output
Congrats! You've got 1 fruits!
peach banana cherry

Other Options I've Considered You could always do args.fruits.split(' ') in Python, but it feels pretty hacky.

mroy-seedbox commented 1 week ago

Too bad if your fruit names have a space in them, like "honeydew melon".

stevenayers-bge commented 1 week ago

Too bad if your fruit names have a space in them, like "honeydew melon".

chaos tbh

lennartkats-db commented 9 hours ago

Thanks for the report @stevenayers-bge. We'll take this request into consideration for the Workflows product!

One idea could be to support something like {{* job.parameters.fruits}} for array arguments. Note that this would be an extension of the Workflows parameters feature, not something at the CLI / DABs level. We want to look at that space a bit holistically before doing more extensions right now, especially since there is also the ${ } notation from DABs.

In the intermediate, you could use a space-separated, comma-separated, or JSON-based value for job.parameters.fruits. I realize that is not ideal, and we hope to have a better solution for you at a later point.

I'll close this since this doesn't directly relate to the CLI, but let me know if you have further thoughts/questions. You can also leave feedback at https://docs.databricks.com/en/resources/ideas.html.