dbt-labs / dbt-external-tables

dbt macros to stage external sources
https://hub.getdbt.com/dbt-labs/dbt_external_tables/latest/
Apache License 2.0
299 stars 120 forks source link

Struggling to get refreshing table partitions to work #208

Closed TPreece101 closed 1 year ago

TPreece101 commented 1 year ago

Describe the bug

I have managed to set up some unpartitioned sources just fine, however when I try to set up a partitioned source I get the following error:

14:30:21  Encountered an error while running operation: Compilation Error
  'dict object' has no attribute 'vals'

  > in macro refresh_external_table (macros/common/refresh_external_table.sql)
  > called by macro redshift__get_external_build_plan (macros/plugins/redshift/get_external_build_plan.sql)
  > called by macro get_external_build_plan (macros/common/get_external_build_plan.sql)
  > called by macro stage_external_sources (macros/common/stage_external_sources.sql)
  > called by <Unknown>

It looks like it's failing here: https://github.com/dbt-labs/dbt-external-tables/blob/main/macros/plugins/redshift/refresh_external_table.sql#L17

What kind of object is it meant to be if not a dictionary? It looks like it just parses the partitions key in my __sources.yml.

I'm also interested as I'm trying to write some custom materialisations for external tables (which I'm thinking of contributing once they're a bit more battle tested) how does it find the partition paths and values? I've not found a way in Redshift to be able to get these.

Steps to reproduce

Sample __sources.yml

version: 2

sources:
  - name: raw
    schema: spectrum
    tables:
      - name: external_dbt_test_partitioned
        description: >
          This is a test table

        external:
          location: s3://my_lovely_bucket/path_to_partitioned_parquet/
          file_format: parquet
          partitions:
            - name: partition_test_col
              data_type: VARCHAR(256)

        columns:
          - name: test_col_1
            data_type: VARCHAR(256)
          - name: test_col2
            data_type: VARCHAR(256

I then run dbt run-operation stage_external_sources to get my error message.

System information

The contents of your packages.yml file:

packages:
  - package: dbt-labs/dbt_external_tables
    version: 0.8.3
  - package: dbt-labs/dbt_utils
    version: 1.0.0
  - package: dbt-labs/redshift
    version: 0.8.0
  - package: dbt-labs/codegen
    version: 0.9.0

Which database are you using dbt with?

The output of dbt --version:

Core:
  - installed: 1.5.0
  - latest:    1.5.0 - Up to date!

Plugins:
  - postgres: 1.5.0 - Up to date!

The operating system you're using: Debian GNU/Linux 10 (buster)

The output of python --version: Python 3.8.12

Let me know if you need any more information, I'm looking forward to getting to grips with this library better 😊

TPreece101 commented 1 year ago

I have realised my mistake here - I thought the library was picking up the partitions from S3 but it looks like they're generated based on date ranges.

I've also realised that for my use case since I'm unloading a query to S3 and then creating the table I can actually get the partition values from the query itself so I'll give that a try.