dbt-labs / dbt-bigquery

dbt-bigquery contains all of the code required to make dbt operate on a BigQuery database.
https://github.com/dbt-labs/dbt-bigquery
Apache License 2.0
224 stars 157 forks source link

[Regression] support non-literal `batch_id` config for python models on dataproc #1321

Open maxmckittrick opened 3 months ago

maxmckittrick commented 3 months ago

Is this your first time submitting a feature request?

Describe the feature

currently, the default batch ID that's included for python models submitted to dataproc is simply str(uuid.uuid4()), this was last changed with #1020.

this works, and is sufficient to avoid 409 Already exists: Failed to create batch errors from dataproc when attempting to submit batches with duplicate names, but after the test changes included in #1014, attempting to pass any non-literal batch_id in the model config will cause a parsing error, e.g.;

18:19:35  Running with dbt=1.8.5
18:19:36  Registered adapter: bigquery=1.8.2
18:19:36  Unable to do partial parsing because of a version mismatch
18:19:39  Encountered an error:
Parsing Error
  Error when trying to literal_eval an arg to dbt.ref(), dbt.source(), dbt.config() or dbt.config.get()
  malformed node or string on line 49: <ast.Name object at 0x169b599f0>
  https://docs.python.org/3/library/ast.html#ast.literal_eval
  In dbt python model, `dbt.ref`, `dbt.source`, `dbt.config`, `dbt.config.get` function args only support Python literal structures

this makes passing any non-default batch_id more or less impossible, as using a var to assign a dynamic batch ID at runtime will throw an error from literal_eval, and setting a static batch ID will allow a model to run on dataproc only once before throwing a 409 error.

Describe alternatives you've considered

one alternative would be to amend the default_batch_id config to prepend the model name with either a uuid, or with a non-static dbt env var, maybe invocation_id (unsure if this would only work on dbt cloud)? this would avoid the previous errors when using created_at as mentioned in #1006

Who will this benefit?

everyone who wants to see descriptive batch names in dataproc!

Are you interested in contributing this feature?

yes, I'm a regular dbt user but haven't contributed anything here before :)

Anything else?

I've confirmed this is broken in both dbt-core v1.8.5/dbt-bigquery v1.8.2 and dbt-core v1.7.16/dbt-bigquery v1.7.9

amychen1776 commented 3 months ago

@maxmckittrick Thank you for opening up the issue. What are the use cases for which you use the batch ids? (I assume it's to help you identify the queries?)

maxmckittrick commented 1 month ago

@amychen1776 yes, it'd be very helpful for us to see descriptive batch names when viewing the dataproc console; we typically run a few dozen python models per day in production, and there's no way to easily identify which batch is associated with which dbt model: image