aws-samples / dbt-glue

This repository contains de dbt-glue adapter
Apache License 2.0
95 stars 61 forks source link

The tmp table is created before pre_hooks run #181

Closed zhangyuan closed 1 year ago

zhangyuan commented 1 year ago

Describe the bug

The tmp table in incremental strategy can't access the temp view created in pre_hooks.

Steps To Reproduce

Expected behavior

1) Create a model with incremental strategy.

{{ config(
  materialized='incremental',
  partition_by=['ingestion_date'],
  incremental_strategy='insert_overwrite',
  pre_hook="CREATE OR REPLACE TEMPORARY VIEW mytable_csv \
    USING CSV OPTIONS (header true, ignoreLeadingWhiteSpace true, ignoreTrailingWhiteSpace true,  \
    path 's3://*******/mytable_csv') \
  "
)}}

SELECT a
     , b
     , c
FROM mytable_csv

2) Run dbt run to build the model for the first time. The table is created successfully.

3) Run dbt run again. The command returns the error

AnalysisException: Insufficient Lake Formation permission(s) on mytable_csv (Service: AWSGlue; Status Code: 400; Error Code: AccessDeniedException; Request ID: xxxx; Proxy: null)

Screenshots and log output

N/A

System information

The output of dbt --version:

dbt --version
Core:
  - installed: 1.4.1
  - latest:    1.5.1 - Update available!

  Your version of dbt-core is out of date!
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Plugins:
  - glue:  1.4.21 - Up to date!
  - spark: 1.4.1  - Update available!

  At least one plugin is out of date or incompatible with dbt-core.
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

The operating system you're using:

MacOS

The output of python --version:

Python 3.9.6

Additional context

Reading from csv may not a common use case. What i want to make dbt-glue read csv directly, rather than using another tool to load the csv into data lake.

After checking the code, I found that in the current implementation, when then table exists, dbt-glue creates a tmp table (L68) before running the pre_hooks (L74)

https://github.com/aws-samples/dbt-glue/blob/00f8290f6a1cd26de558fae437b11d267e383e8c/dbt/include/glue/macros/materializations/incremental/incremental.sql#L68-L74

zhangyuan commented 1 year ago

In the latest dbt-spark (v1.6.0b3), the pre_hooks also runs before creating the tmp table

https://github.com/dbt-labs/dbt-spark/blob/v1.6.0b3/dbt/include/spark/macros/materializations/incremental/incremental.sql#L33-L57