cloudera / dbt-impala

A dbt adapter for Apache Impala & Cloudera Data Platform
Apache License 2.0
24 stars 15 forks source link

incremental strategy not inserting #177

Open pmheld opened 1 year ago

pmheld commented 1 year ago

Hi, I have dbt-impala==1.4.0 and dbt-core==1.4.1 installed and experience a problem with a model that should use materialized='incremental'. The issue is that this single model in the project is initially running fine since no table existed in the Cloudera CDP (Impala) but the second time that I'm running dbt it tries to create the table that already exists rather than doing an insert respectively an append to the table.

This is what dbt show while running:

10:27:23  Running with dbt=1.4.1
10:27:23  Found 1 model, 0 tests, 0 snapshots, 0 analyses, 317 macros, 0 operations, 0 seed files, 4 sources, 0 exposures, 0 metrics
10:27:23  
10:27:32  Concurrency: 3 threads (target='impala')
10:27:32  
10:27:32  1 of 1 START sql incremental model econda.a_econda_basis_v2 .................... [RUN]
10:27:33  1 of 1 ERROR creating sql incremental model econda.a_econda_basis_v2 ........... [ERROR in 0.36s]
10:27:33  
10:27:33  Finished running 1 incremental model in 0 hours 0 minutes and 9.36 seconds (9.36s).
10:27:33  
10:27:33  Completed with 1 error and 0 warnings:
10:27:33  
10:27:33  Runtime Error in model a_econda_basis_v2 (models/a_econda_basis_v2.sql)
10:27:33    Unable to establish connection to Impala server: AnalysisException: Table already exists: econda.a_econda_basis_v2
10:27:33    
10:27:33  
10:27:33  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 TOTAL=1

The project.yml looks like this:

name: 'econda'
version: '1.0.0'
config-version: 2
profile: 'monitoring'

model-paths: ["models"]
test-paths: ["tests"]
macro-paths: ["macros"]

target-path: "target"  
clean-targets:         
  - "target"
  - "dbt_packages"

models:
  econda:
    +materialized: table
    +schema: econda

The model looks like this:

{{
    config(
        materialized='incremental',
        incremental_strategy='append',
        partition_by=['sourcemonth']
    )
}}
    SELECT 
        cast(request_id AS VARCHAR(12)) AS request_id,
        `timestamp` AS time_stamp,
        to_date(`timestamp`) AS date_request,
        viewtime_s,
        url,
        oc.icamp,
        CASE
                WHEN oc.icamptype = '2' THEN 'clicked'
                WHEN oc.icamptype = '1' THEN 'seen'
                WHEN oc.icamptype = '0' THEN 'last_seen'
        ELSE NULL
        END AS icamptype,
        RANK () OVER (PARTITION BY v3.visit_id ORDER BY `timestamp` ASC) Request_Rank,
        v3.sourcedate,
        v3.sourcemonth
    FROM {{source('econda', 'co_econda_rohdaten_v3_ohne_events')}} v3
    LEFT JOIN {{source('econda', 'co_econda_rohdaten_lexwared_3')}} e3 ON v3.request_id = e3.requestid
    LEFT JOIN {{source('econda', 'co_econda_rohdaten_onsite_campaign')}} oc ON v3.request_id = oc.requestid

{% if is_incremental() %}

    where v3.`timestamp` >= date_add(current_date(),-1)
    and e3.sourcedate >= 20230629 
    and oc.sourcedate >= 20230629 

{% endif %}

This is converted into:

  create  table
    econda.a_econda_basis_v2
      partitioned by (sourcemonth)
  as
    SELECT 
        cast(request_id AS VARCHAR(12)) AS request_id,
        `timestamp` AS time_stamp,
        to_date(`timestamp`) AS date_request,
        viewtime_s,
        oc.icamp,
        CASE
                WHEN oc.icamptype = '2' THEN 'clicked'
                WHEN oc.icamptype = '1' THEN 'seen'
                WHEN oc.icamptype = '0' THEN 'last_seen'
        ELSE NULL
        END AS icamptype,
        RANK () OVER (PARTITION BY v3.visit_id ORDER BY `timestamp` ASC) Request_Rank,
        v3.sourcedate,
        v3.sourcemonth
    FROM econda.co_econda_rohdaten_v3_ohne_events v3
    LEFT JOIN econda.co_econda_rohdaten_lexwared_3 e3 ON v3.request_id = e3.requestid
    LEFT JOIN econda.co_econda_rohdaten_onsite_campaign oc ON v3.request_id = oc.requestid

From what I've seen in the 860 lines of the log file this part migth be interesting:

[0m10:27:32.699522 [debug] [MainThread]: Impala adapter: NotImplemented: add_begin_query
10:27:32.700014 [debug] [MainThread]: Impala adapter: NotImplemented: commit
10:27:32.700679 [info ] [MainThread]: Concurrency: 3 threads (target='impala')
10:27:32.700964 [info ] [MainThread]: 
10:27:32.722855 [debug] [Thread-145]: Began running node model.econda.a_econda_basis_v2
10:27:32.723378 [info ] [Thread-145]: 1 of 1 START sql incremental model econda.a_econda_basis_v2 .................... [RUN]
10:27:32.724054 [debug] [Thread-145]: Acquiring new impala connection 'model.econda.a_econda_basis_v2'
10:27:32.724702 [debug] [Thread-145]: Began compiling node model.econda.a_econda_basis_v2
10:27:32.735790 [debug] [Thread-145]: Writing injected SQL for node "model.econda.a_econda_basis_v2"
10:27:32.736417 [debug] [Thread-145]: Timing info for model.econda.a_econda_basis_v2 (compile): 2023-06-30 10:27:32.724942 => 2023-06-30 10:27:32.736311
10:27:32.736985 [debug] [Thread-145]: Began executing node model.econda.a_econda_basis_v2
10:27:32.767536 [debug] [Thread-145]: Tracker adapter: Usage tracking flag True. To turn on/off use usage_tracking flag in profiles.yml
10:27:32.768222 [debug] [Thread-151]: Tracker adapter: Sending Event {'data': '{"event_type": "model_access", "model_name": "econda.a_econda_basis_v2", "model_type": "table", "incremental_strategy": "", "auth": "N/A", "connection_state": "N/A", "elapsed_time": "N/A", "permissions": "N/A", "profile_name": "N/A", "sql_type": "N/A", "id": "6f580870-1459-4081-a126-fd1ed5cbeb18", "unique_host_hash": "6684ca3f30d055ca36911ca432eb2347", "unique_user_hash": "68c4283db8074b12df1660b31c0220a9", "unique_session_hash": "60313cc0204ea2f349b15a2fa6f4abd0", "python_version": "3.8.16", "system": "Linux", "machine": "x86_64", "platform": "Linux-5.4.0-107-generic-x86_64-with-glibc2.2.5", "dbt_version": "1.4.1", "dbt_adapter_type": "impala", "dbt_adapter_version": "1.4.0", "dbt_deployment_env": {}, "project_name": "monitoring", "target_name": "impala", "no_of_threads": 3, "warehouse_version": {"version": "impalad version 3.4.0-SNAPSHOT", "build": "impalad version 3.4.0-SNAPSHOT RELEASE (build 75494b221421452d0b928d844e04afa70654d528)\\nBuilt on Wed Feb  8 15:18:37 UTC 2023"}}'}
10:27:32.772748 [debug] [Thread-145]: Tracker adapter: Usage tracking flag True. To turn on/off use usage_tracking flag in profiles.yml
10:27:32.773557 [debug] [Thread-152]: Tracker adapter: Sending Event {'data': '{"event_type": "model_access", "model_name": "econda.a_econda_basis_v2__dbt_tmp", "model_type": "table", "incremental_strategy": "", "auth": "N/A", "connection_state": "N/A", "elapsed_time": "N/A", "permissions": "N/A", "profile_name": "N/A", "sql_type": "N/A", "id": "6f580870-1459-4081-a126-fd1ed5cbeb18", "unique_host_hash": "6684ca3f30d055ca36911ca432eb2347", "unique_user_hash": "68c4283db8074b12df1660b31c0220a9", "unique_session_hash": "60313cc0204ea2f349b15a2fa6f4abd0", "python_version": "3.8.16", "system": "Linux", "machine": "x86_64", "platform": "Linux-5.4.0-107-generic-x86_64-with-glibc2.2.5", "dbt_version": "1.4.1", "dbt_adapter_type": "impala", "dbt_adapter_version": "1.4.0", "dbt_deployment_env": {}, "project_name": "monitoring", "target_name": "impala", "no_of_threads": 3, "warehouse_version": {"version": "impalad version 3.4.0-SNAPSHOT", "build": "impalad version 3.4.0-SNAPSHOT RELEASE (build 75494b221421452d0b928d844e04afa70654d528)\\nBuilt on Wed Feb  8 15:18:37 UTC 2023"}}'}
10:27:32.777193 [debug] [Thread-145]: Tracker adapter: Usage tracking flag True. To turn on/off use usage_tracking flag in profiles.yml
10:27:32.780691 [debug] [Thread-153]: Tracker adapter: Sending Event {'data': '{"event_type": "model_access", "model_name": "econda.a_econda_basis_v2__dbt_tmp", "model_type": "table", "incremental_strategy": "", "auth": "N/A", "connection_state": "N/A", "elapsed_time": "N/A", "permissions": "N/A", "profile_name": "N/A", "sql_type": "N/A", "id": "6f580870-1459-4081-a126-fd1ed5cbeb18", "unique_host_hash": "6684ca3f30d055ca36911ca432eb2347", "unique_user_hash": "68c4283db8074b12df1660b31c0220a9", "unique_session_hash": "60313cc0204ea2f349b15a2fa6f4abd0", "python_version": "3.8.16", "system": "Linux", "machine": "x86_64", "platform": "Linux-5.4.0-107-generic-x86_64-with-glibc2.2.5", "dbt_version": "1.4.1", "dbt_adapter_type": "impala", "dbt_adapter_version": "1.4.0", "dbt_deployment_env": {}, "project_name": "monitoring", "target_name": "impala", "no_of_threads": 3, "warehouse_version": {"version": "impalad version 3.4.0-SNAPSHOT", "build": "impalad version 3.4.0-SNAPSHOT RELEASE (build 75494b221421452d0b928d844e04afa70654d528)\\nBuilt on Wed Feb  8 15:18:37 UTC 2023"}}'}
10:27:32.785485 [debug] [Thread-145]: Tracker adapter: Usage tracking flag True. To turn on/off use usage_tracking flag in profiles.yml
10:27:32.787221 [debug] [Thread-154]: Tracker adapter: Sending Event {'data': '{"event_type": "model_access", "model_name": "econda.a_econda_basis_v2__dbt_backup", "model_type": "table", "incremental_strategy": "", "auth": "N/A", "connection_state": "N/A", "elapsed_time": "N/A", "permissions": "N/A", "profile_name": "N/A", "sql_type": "N/A", "id": "6f580870-1459-4081-a126-fd1ed5cbeb18", "unique_host_hash": "6684ca3f30d055ca36911ca432eb2347", "unique_user_hash": "68c4283db8074b12df1660b31c0220a9", "unique_session_hash": "60313cc0204ea2f349b15a2fa6f4abd0", "python_version": "3.8.16", "system": "Linux", "machine": "x86_64", "platform": "Linux-5.4.0-107-generic-x86_64-with-glibc2.2.5", "dbt_version": "1.4.1", "dbt_adapter_type": "impala", "dbt_adapter_version": "1.4.0", "dbt_deployment_env": {}, "project_name": "monitoring", "target_name": "impala", "no_of_threads": 3, "warehouse_version": {"version": "impalad version 3.4.0-SNAPSHOT", "build": "impalad version 3.4.0-SNAPSHOT RELEASE (build 75494b221421452d0b928d844e04afa70654d528)\\nBuilt on Wed Feb  8 15:18:37 UTC 2023"}}'}
10:27:32.796155 [debug] [Thread-145]: Tracker adapter: Usage tracking flag True. To turn on/off use usage_tracking flag in profiles.yml
10:27:32.806767 [debug] [Thread-155]: Tracker adapter: Sending Event {'data': '{"event_type": "incremental", "model_name": "econda.a_econda_basis_v2", "model_type": "table", "incremental_strategy": "append", "auth": "N/A", "connection_state": "N/A", "elapsed_time": "N/A", "permissions": "N/A", "profile_name": "N/A", "sql_type": "N/A", "id": "6f580870-1459-4081-a126-fd1ed5cbeb18", "unique_host_hash": "6684ca3f30d055ca36911ca432eb2347", "unique_user_hash": "68c4283db8074b12df1660b31c0220a9", "unique_session_hash": "60313cc0204ea2f349b15a2fa6f4abd0", "python_version": "3.8.16", "system": "Linux", "machine": "x86_64", "platform": "Linux-5.4.0-107-generic-x86_64-with-glibc2.2.5", "dbt_version": "1.4.1", "dbt_adapter_type": "impala", "dbt_adapter_version": "1.4.0", "dbt_deployment_env": {}, "project_name": "monitoring", "target_name": "impala", "no_of_threads": 3, "warehouse_version": {"version": "impalad version 3.4.0-SNAPSHOT", "build": "impalad version 3.4.0-SNAPSHOT RELEASE (build 75494b221421452d0b928d844e04afa70654d528)\\nBuilt on Wed Feb  8 15:18:37 UTC 2023"}}'}
10:27:32.836716 [debug] [Thread-145]: Tracker adapter: Usage tracking flag True. To turn on/off use usage_tracking flag in profiles.yml
10:27:32.837476 [debug] [Thread-156]: Tracker adapter: Sending Event {'data': '{"event_type": "model_access", "model_name": "econda.a_econda_basis_v2", "model_type": "table", "incremental_strategy": "", "auth": "N/A", "connection_state": "N/A", "elapsed_time": "N/A", "permissions": "N/A", "profile_name": "N/A", "sql_type": "N/A", "id": "6f580870-1459-4081-a126-fd1ed5cbeb18", "unique_host_hash": "6684ca3f30d055ca36911ca432eb2347", "unique_user_hash": "68c4283db8074b12df1660b31c0220a9", "unique_session_hash": "60313cc0204ea2f349b15a2fa6f4abd0", "python_version": "3.8.16", "system": "Linux", "machine": "x86_64", "platform": "Linux-5.4.0-107-generic-x86_64-with-glibc2.2.5", "dbt_version": "1.4.1", "dbt_adapter_type": "impala", "dbt_adapter_version": "1.4.0", "dbt_deployment_env": {}, "project_name": "monitoring", "target_name": "impala", "no_of_threads": 3, "warehouse_version": {"version": "impalad version 3.4.0-SNAPSHOT", "build": "impalad version 3.4.0-SNAPSHOT RELEASE (build 75494b221421452d0b928d844e04afa70654d528)\\nBuilt on Wed Feb  8 15:18:37 UTC 2023"}}'}
10:27:32.870371 [debug] [Thread-145]: Writing runtime sql for node "model.econda.a_econda_basis_v2"
10:27:32.871312 [debug] [Thread-145]: Impala adapter: NotImplemented: add_begin_query
10:27:32.871649 [debug] [Thread-145]: Using impala connection "model.econda.a_econda_basis_v2"
10:27:32.871863 [debug] [Thread-145]: Tracker adapter: Usage tracking flag True. To turn on/off use usage_tracking flag in profiles.yml
10:27:32.872804 [debug] [Thread-157]: Tracker adapter: Sending Event {'data': '{"event_type": "start_query", "profile_name": "monitoring", "app": "dbt", "dbt_version": "1.4.1", "target_name": "impala", "model_name": "model.econda.a_econda_basis_v2", "sql_type": "create table", "auth": "N/A", "connection_state": "N/A", "elapsed_time": "N/A", "incremental_strategy": "N/A", "model_type": "N/A", "permissions": "N/A", "id": "6f580870-1459-4081-a126-fd1ed5cbeb18", "unique_host_hash": "6684ca3f30d055ca36911ca432eb2347", "unique_user_hash": "68c4283db8074b12df1660b31c0220a9", "unique_session_hash": "60313cc0204ea2f349b15a2fa6f4abd0", "python_version": "3.8.16", "system": "Linux", "machine": "x86_64", "platform": "Linux-5.4.0-107-generic-x86_64-with-glibc2.2.5", "dbt_adapter_type": "impala", "dbt_adapter_version": "1.4.0", "dbt_deployment_env": {}, "project_name": "monitoring", "no_of_threads": 3, "warehouse_version": {"version": "impalad version 3.4.0-SNAPSHOT", "build": "impalad version 3.4.0-SNAPSHOT RELEASE (build 75494b221421452d0b928d844e04afa70654d528)\\nBuilt on Wed Feb  8 15:18:37 UTC 2023"}}'}
10:27:32.873058 [debug] [Thread-145]: On model.econda.a_econda_basis_v2: /* {"app": "dbt", "dbt_version": "1.4.1", "profile_name": "monitoring", "target_name": "impala", "node_id": "model.econda.a_econda_basis_v2"} */

  create  table
    econda.a_econda_basis_v2
      partitioned by (sourcemonth)
....(sql script)

Unfortunately I'm not able to append the data to an existing table, even when I play around with the config options e.g. changing the materialized to incremental in the dbt_project.yml or inside the model config. After posting in the slack channel I was routed to this place and hope to find help here. Thanks.

pmheld commented 1 year ago

to whom it may concern: @cloudera-hudson @myloginid @shreelola @vamshikolanu

I found out what the problem is and I would consider it a bug: in the project.yml the following is specified: models: econda: +materialized: table +schema: econda

In the profiles.yml another user is connecting to impala having all rights on schema econda.

When running the incremental model the first time, the table is created fine. When running subsequently it is failing because it wants to create the table again rather than doing an insert as configured in the models config.

In incremental mode the program is somehow trying to identify the table in the schema of the user that is logged in and not in the schema specified in the project.yml

I tested the behavior by eliminating the "+schema: econda" part from the project.yml and it was working as expected in the schema of the user that is logged in. I want to be able though to use the schema that is not equal to the users schema that I'm logged in with.

BTW I'm using custom schema macro as described at https://docs.getdbt.com/docs/build/custom-schemas

Your feedback is very much appreciated.

niteshy commented 1 year ago

Thanks @pmheld for further information about the issue, We will update the issue soon and provide the fix.

vamshikolanu commented 1 year ago

Hello @pmheld. Can you please share the complete debug logs with us? We are unable to reproduce the issue on our end. We have tested our code with this example to reproduce the scenario you mentioned, and it works fine.

Please notice: https://github.com/cloudera/dbt-impala-example/blob/main/dbt_impala_demo/dbt_project.yml#L50 has a custom schema defined too.

I wonder if show tables isn't working in your case.

pmheld commented 1 year ago

Hi @vamshikolanu,

attached you find the logs and your guess seems tob e right, as far as I interpret the logs.

Hope this helps. Thanks

Von: vamshi @.> Gesendet: Montag, 18. September 2023 19:16 An: cloudera/dbt-impala @.> Cc: Held, Patrick @.>; Mention @.> Betreff: Re: [cloudera/dbt-impala] incremental strategy not inserting (Issue #177)

Hello @pmheldhttps://github.com/pmheld. Can you please share the complete debug logs with us? We are unable to reproduce the issue on our end. We have tested our code with this examplehttps://github.com/cloudera/dbt-impala-example/tree/main/dbt_impala_demo to reproduce the scenario you mentioned, and it works fine.

Please notice: https://github.com/cloudera/dbt-impala-example/blob/main/dbt_impala_demo/dbt_project.yml#L50 has a custom schema defined too.

I wonder if show tables isn't working in your case.

- Reply to this email directly, view it on GitHubhttps://github.com/cloudera/dbt-impala/issues/177#issuecomment-1724030708, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A6VZON55IFAEPAVR7R5ONXDX3B6VLANCNFSM6AAAAAAZZYQKTQ. You are receiving this because you were mentioned.Message ID: @.**@.>>

vamshikolanu commented 1 year ago

Hello @pmheld, I can't find logs here. Did you forget to attach logs?

pmheld commented 1 year ago

Hi @vamshikolanu, sorry, I attached the file in the reply but obviously it doesn't work. You can find it here

pmheld commented 1 year ago

Hi @ vamshikolanu,

any suggestions on this? Do I have to check with impala/cloudera or could it be an issue related to python, kerberos or dbt?

Thanks, Patrick

Von: Held, Patrick Gesendet: Dienstag, 19. September 2023 15:07 An: cloudera/dbt-impala @.***> Betreff: AW: [cloudera/dbt-impala] incremental strategy not inserting (Issue #177)

Hi @vamshikolanu,

attached you find the logs and your guess seems tob e right, as far as I interpret the logs.

Hope this helps. Thanks

Von: vamshi @.**@.>> Gesendet: Montag, 18. September 2023 19:16 An: cloudera/dbt-impala @.**@.>> Cc: Held, Patrick @.**@.>>; Mention @.**@.>> Betreff: Re: [cloudera/dbt-impala] incremental strategy not inserting (Issue #177)

Hello @pmheldhttps://github.com/pmheld. Can you please share the complete debug logs with us? We are unable to reproduce the issue on our end. We have tested our code with this examplehttps://github.com/cloudera/dbt-impala-example/tree/main/dbt_impala_demo to reproduce the scenario you mentioned, and it works fine.

Please notice: https://github.com/cloudera/dbt-impala-example/blob/main/dbt_impala_demo/dbt_project.yml#L50 has a custom schema defined too.

I wonder if show tables isn't working in your case.

- Reply to this email directly, view it on GitHubhttps://github.com/cloudera/dbt-impala/issues/177#issuecomment-1724030708, or unsubscribehttps://github.com/notifications/unsubscribe-auth/A6VZON55IFAEPAVR7R5ONXDX3B6VLANCNFSM6AAAAAAZZYQKTQ. You are receiving this because you were mentioned.Message ID: @.**@.>>

pmheld commented 1 year ago

@cloudera-hudson @myloginid @shreelola @vamshikolanu I've provided the log in a link to my github in the post of sep 20. Were you able to check it out yet? Thanks