aws-samples / dbt-glue

This repository contains de dbt-glue adapter
Apache License 2.0
94 stars 60 forks source link

Error when creating snapshots #35

Closed OElesin closed 1 year ago

OElesin commented 2 years ago

Describe the bug

When creating snapshots, I get the error that the target table for the snapshots to be loaded into does not exist. This causes a failure in the dbt snapshot execution.

Steps To Reproduce

  1. Create a snapshots directory in the dbt project folder
  2. add a file demosnapshot.sql with the content as seen below:
    {% snapshot demosnapshot %}
    {{
        config(
            strategy='timestamp',
            target_schema='mytargetdatabase',
            target_database='mytargetdatabase',
            unique_key='unique_id',
            updated_at='created_date',
            file_format='hudi'
        )
    }}
    select * from {{ source('data_source', 'events_data') }}
    {% endsnapshot %}
  3. Add IAM Role with Lake Formation permissions used by Glue Interactive Sessions

    DataLakeDPCleanedTablesPermissions:
    Type: AWS::LakeFormation::Permissions
    Properties:
      DataLakePrincipal:
        DataLakePrincipalIdentifier: !GetAtt IamGlueRole.Arn
      Resource:
        TableResource:
          DatabaseName: !Sub mytargetdatabase
          TableWildcard: { }
      Permissions:
        - DESCRIBE
        - SELECT
        - INSERT
        - DROP
    
    DataLakeDPCleanedDatabasePermissions:
    Type: AWS::LakeFormation::Permissions
    Properties:
      DataLakePrincipal:
        DataLakePrincipalIdentifier: !GetAtt IamGlueRole.Arn
      Resource:
        DatabaseResource:
          Name: !Sub mytargetdatabase
      Permissions:
        - CREATE_TABLE
        - ALTER
        - DROP
        - DESCRIBE
  4. Run the command $ dbt snapshot in the terminal

Expected behavior

According to the dbt example on snapshots, no error is expected

Screenshots and log output

dbt.log output below

06:30:57.855066 [debug] [Thread-1  ]: Glue adapter: GlueConnection cursor called
06:30:58.151902 [error] [Thread-1  ]: Glue adapter: An error occurred (AccessDeniedException) when calling the GetTable operation: Insufficient Lake Formation permission(s) on demosnapshot
06:30:58.152336 [error] [Thread-1  ]: Glue adapter: relation mytargetdatabase.demosnapshot not found
06:30:58.152817 [error] [Thread-1  ]: Glue adapter: name 'list_schemas' is not defined
06:30:58.153291 [error] [Thread-1  ]: Glue adapter: check_schema_exists exception
06:30:58.156281 [debug] [Thread-1  ]: finished collecting timing info
06:30:58.156535 [debug] [Thread-1  ]: On snapshot.datalakedbtdemo.demosnapshot: Close
06:30:58.156717 [debug] [Thread-1  ]: Glue adapter: NotImplemented: close

System information

The output of dbt --version:

Core:
  - installed: 1.1.0
  - latest:    1.1.0 - Up to date!

Plugins:
  - bigquery:  1.1.0 - Up to date!
  - snowflake: 1.1.0 - Up to date!
  - redshift:  1.1.0 - Up to date!
  - postgres:  1.1.0 - Up to date!

The operating system you're using: MacOS

The output of python --version: python3

Additional context

Definitely, the snapshot table would not exist and permissions have been given to the Glue Interactive Sessions IAM role to create tables.

menuetb commented 2 years ago

Hi, Thanks for let us know. We are working on implementing the snapshot feature, but it's not yet available, I'll let you know once it will be available.

OElesin commented 2 years ago

@menuetb , any updates on this?

menuetb commented 1 year ago

Hi, I just developed a beta which works for snapshots using Delta.

You can install this beta using the following command: python3 -m pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ dbt-glue

You can find the documentation for the configuration here: https://github.com/aws-samples/dbt-glue/blob/delta_lake_integration/README.md#delta

You need to install delta glue connectors to make it works, here is the doc for that: https://docs.aws.amazon.com/glue/latest/ug/connectors-chapter.html

Here is an example that works on my account:

{% snapshot demosnapshot %}

{{
    config(
        strategy='timestamp',
        target_schema='dbt_demo_01',
        unique_key='customer_id',
        updated_at='dt',
        file_format='delta'
    )
}}

select * from {{ ref('customer_delta') }}

{% endsnapshot %}
menuetb commented 1 year ago

I just published a new version of the adapter supporting snapshots for both delta and hudi, this feature is now available