databricks / databricks-sdk-py

Databricks SDK for Python (Beta)
https://databricks-sdk-py.readthedocs.io/
Apache License 2.0
369 stars 124 forks source link

[FEATURE] Pipeline method (pipelines.start_update) should include kwargs for body for adding runtime arguments #359

Open narquette opened 1 year ago

narquette commented 1 year ago

Problem Statement As a data scientist/engineer, I would like to be able to pass through run time arguments that can be used in the pipeline code so that I don't have to create separate pipelines for different purposes.

Proposed Solution Update the start_update method to include kwargs that are added to the body of the post message.

Additional Context If I want to accomplish this today, I would have to update the job will all the require information including configuration ({key: value}), then run the start_update method.

mgyucht commented 1 year ago

Thanks for this feature request @narquette. I believe this may be a gap in the start_updates API. I'll raise this feature request to the Delta Live Tables team to review.

zoe-h-durand commented 1 year ago

Hi @narquette, thanks for your feedback. I am part of the Delta Live Tables team. May I ask for a little more context on your request: what are you trying to accomplish? What kind of runtime arguments would you like to pass in, and what would happen with these arguments at runtime? Thanks!

narquette commented 1 year ago

@zoe-h-durand I would like to be able to override delta live table argument just like you can do with jobs. For example, I would like to be able to run one job against any number of s3 buckets (prod s3 bucket, dev s3 bucket, qa s3 bucket) and be able to call the pipeline with supplied arguments for the source s3 path.

zoe-h-durand commented 1 year ago

I see. storage location is a pipeline configuration: is it fair to say that when you say "delta live table argument", do you mean "pipeline configuration"? Just want to make sure. thanks.

narquette commented 1 year ago

image

I'm hoping to be able to override configuration much like you would be able to for a pipeline task parameter.

zoe-h-durand commented 1 year ago

ok, it's clear, thanks. Just for my information, what kind of logic do you have in place that makes this necessary to accomplish what you want to do? Do you write to different paths (but with the same code) based on some upstream process?