dbt-labs / dbt-external-tables

dbt macros to stage external sources
https://hub.getdbt.com/dbt-labs/dbt_external_tables/latest/
Apache License 2.0
305 stars 122 forks source link

Update from 0.5.0 -> 0.6.0 strips leading 0s in Redshift Spectrum Partitions #69

Closed dm03514 closed 3 years ago

dm03514 commented 3 years ago

Describe the bug

We are using redshift spectrum external tables and we have partitions of the form:

/YYYYmmdd/HH

In v0.5.0 we generate the hourly partitions using:

vals:
 - 00
 - 01
 - ...
 - 23

We then execute:

$ dbt run-operation stage_external_sources

In v0.5.0 this will create 24 partitions per day which include leading 0s, i.e. 00, 01

After upgrading to v0.6.0 the same command generates partitions and strips the leading 0s. This means we are generating partitions that don't map to actual directories, i.e. our partition is

/20200101/01/

But dbt is generating the partition without the leading 0.

/20200101/1/

Steps to reproduce

Expected results

Leading 0s are preserved in 0.6.0, or there is a config option to allow them to be preserved.

Actual results

Leading 0s are stripped from partitions when using 0.6.0.

Screenshots and log output

This shows the leading 0's being stripped as soon as we upgraded to version 0.6.0:

Screen Shot 2021-02-02 at 7 59 02 AM

System information

The contents of your packages.yml file:

packages:
  - package: fishtown-analytics/dbt_utils
    version: 0.6.2 

  - package: fishtown-analytics/dbt_external_tables
    version: 0.5.0

  - package: fivetran/salesforce
    version: 0.3.0

  - package: fivetran/netsuite
    version: 0.2.0

Which database are you using dbt with?

The output of dbt --version:

installed version: 0.18.1
   latest version: 0.19.0

Your version of dbt is out of date! You can find instructions for upgrading here:
https://docs.getdbt.com/docs/installation

Plugins:
  - bigquery: 0.18.1
  - snowflake: 0.18.1
  - postgres: 0.18.1
  - redshift: 0.18.1

The operating system you're using:

# cat /etc/os-release
PRETTY_NAME="Debian GNU/Linux 10 (buster)"
NAME="Debian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

The output of python --version:

# python --version
Python 3.8.6

Thank you!

jtcohen6 commented 3 years ago

Thanks for the detailed bug report @dm03514! I'll have a look at what's gone wrong here

jtcohen6 commented 3 years ago

Ok, it appears the issue here is two things. First, if you define your partitions like so:

vals:
 - 00
 - 01
 - ...
 - 23

There is a risk of YAML interpreting those values as integers instead of strings. It's safer to do:

vals:
 - '00'
 - '01'
 - '...'
 - '23'

Even then, there is a bug with the new way this package handles Redshift partitions that coerces strings like '00' to numeric 0, when returned from the database and loaded by agate. I'll open a PR with a proposed fix.