Open maxmckittrick opened 2 months ago
@maxmckittrick Thank you for opening up the issue. What are the use cases for which you use the batch ids? (I assume it's to help you identify the queries?)
@amychen1776 yes, it'd be very helpful for us to see descriptive batch names when viewing the dataproc console; we typically run a few dozen python models per day in production, and there's no way to easily identify which batch is associated with which dbt model:
Is this your first time submitting a feature request?
Describe the feature
currently, the default batch ID that's included for python models submitted to dataproc is simply
str(uuid.uuid4())
, this was last changed with #1020.this works, and is sufficient to avoid
409 Already exists: Failed to create batch
errors from dataproc when attempting to submit batches with duplicate names, but after the test changes included in #1014, attempting to pass any non-literalbatch_id
in the model config will cause a parsing error, e.g.;this makes passing any non-default
batch_id
more or less impossible, as using a var to assign a dynamic batch ID at runtime will throw an error fromliteral_eval
, and setting a static batch ID will allow a model to run on dataproc only once before throwing a 409 error.Describe alternatives you've considered
one alternative would be to amend the
default_batch_id
config to prepend the model name with either a uuid, or with a non-static dbt env var, maybeinvocation_id
(unsure if this would only work on dbt cloud)? this would avoid the previous errors when usingcreated_at
as mentioned in #1006Who will this benefit?
everyone who wants to see descriptive batch names in dataproc!
Are you interested in contributing this feature?
yes, I'm a regular dbt user but haven't contributed anything here before :)
Anything else?
I've confirmed this is broken in both dbt-core v1.8.5/dbt-bigquery v1.8.2 and dbt-core v1.7.16/dbt-bigquery v1.7.9