dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
10.03k stars 1.64k forks source link

Raise a more helpful error if BigQuery job label is too long #3612

Closed edbizarro closed 3 years ago

edbizarro commented 3 years ago

Describe the bug

After migrating to v0.20.0 my jobs keep failing with the following error:

Database Error in model stg_facebookads__ads (models/staging/facebook_ads/stg_facebookads__ads.sql)                                                                                                         
  Label value "__database____REDACTED____schema____dev_edbizarro____identifier____stg_facebookads__ads__" has invalid characters.                                                                         
  compiled SQL at target/run/insights_lab/models/staging/facebook_ads/stg_facebookads__ads.sql

Steps To Reproduce

Config labels in dbt_project.yml

query-comment:
  comment: "{{ query_comment(node) }}"
  append: false
  job-label: true

Run any model

Expected behavior

Screenshots and log output

If applicable, add screenshots or log output to help explain your problem.

System information

Which database are you using dbt with?

The output of dbt --version:

 ❯ dkc exec dbt dbt --version                                                                                                                                                                    [17:52:32]
installed version: 0.20.0
   latest version: 0.20.0

Up to date!

Plugins:
  - redshift: 0.20.0
  - snowflake: 0.20.0
  - bigquery: 0.20.0
  - postgres: 0.20.0

The operating system you're using: Arch Linux

The output of python --version:

Running through official docker image

Additional context

Add any other context about the problem here.

jtcohen6 commented 3 years ago

@edbizarro I see you switched on query-comment.job-label, which is new in v0.20.0. The way that feature works:

The latter is what's happening here. Unfortunately, the error BigQuery is giving back is a bit misleading:

Label value "__database____REDACTED____schema____dev_edbizarro____identifier____stg_facebookads__ads__" has invalid characters.

The real issue here is that labels are limited to 63 characters in length (docs), and this string is 89 characters in length. If I shorten the string to 63 characters, everything works just fine.

In the original PR for this feature, we discussed potential approaches for handling too-long labels: https://github.com/dbt-labs/dbt/pull/3145#discussion_r598050639. The options are:

  1. Truncate, hash, or otherwise handle the label length within dbt. This would happen silently, and could result in indistinguishable label values.
  2. Raise an error within dbt.
  3. Do nothing, and return any errors from BigQuery.

We picked the third option. Given the lack of clarity BigQuery's error message, and the ensuing confusion indicated by this issue, I think there's good reason to prefer the second: I think we should raise a compilation error any time query-comment.job-label is switched on and a label value would be >63 characters.

That should be a straightforward change. Is it something you'd be interested in contributing @edbizarro?

In any event, you'll need to work around this error by:

  1. Refactoring your query_comment macro to return a dictionary
  2. Refactoring your query_comment macro to return a shorter string (post-sanitization)
  3. Switching off query-comment.job-label
sungchun12 commented 3 years ago

@jtcohen6 I'm happy to take this issue on if @edbizarro doesn't want to!

jtcohen6 commented 3 years ago

@sungchun12 I'd love that!

sungchun12 commented 3 years ago

@jtcohen6 This is officially in my personal backlog that I'll spend the next week or so focusing on!

I'll be diving into #3145. And I'm assuming the solution to this problem will be extending the validations and raising specific error messages in functions like: this and/or create another function dedicated to verifying string length after sanitization.

edbizarro commented 3 years ago

@sungchun12 Sure! I really want to take this challenge but unfortunately I'm little short in time this next weeks so i would love that someone taking on this, thanks!

jtcohen6 commented 3 years ago

@sungchun12 Yes! I think _sanitize_label raising a ValidationException if passed a string longer than 63 characters will get the job done.