fivetran / dbt_netsuite_source

Data models for Fivetran's Netsuite source package, built using dbt.
https://fivetran.github.io/dbt_netsuite_source/
Apache License 2.0
14 stars 20 forks source link

[Bug] table stg_netsuite2__jobs is missing in netsuite2 source connector #20

Closed nbenezri closed 2 years ago

nbenezri commented 2 years ago

Is there an existing issue for this?

Describe the issue

running dbt run

I have two failures on the same missing table: cdp_stg_netsuite.stg_netsuite2__jobs

running for the latest netsuite2

Relevant error log or model output

06:45:47.354552 [debug] [Thread-2  ]: Began running node test.netsuite_source.not_null_stg_netsuite2__jobs_job_id.d7b5b06b33
06:45:47.354834 [info ] [Thread-2  ]: 94 of 166 START test not_null_stg_netsuite2__jobs_job_id ....................... [RUN]
06:45:47.355687 [debug] [Thread-2  ]: Acquiring new redshift connection "test.netsuite_source.not_null_stg_netsuite2__jobs_job_id.d7b5b06b33"
06:45:47.355950 [debug] [Thread-2  ]: Began compiling node test.netsuite_source.not_null_stg_netsuite2__jobs_job_id.d7b5b06b33
06:45:47.356205 [debug] [Thread-2  ]: Compiling test.netsuite_source.not_null_stg_netsuite2__jobs_job_id.d7b5b06b33
06:45:47.364691 [debug] [Thread-2  ]: Writing injected SQL for node "test.netsuite_source.not_null_stg_netsuite2__jobs_job_id.d7b5b06b33"
06:45:47.365156 [debug] [Thread-2  ]: finished collecting timing info
06:45:47.365388 [debug] [Thread-2  ]: Began executing node test.netsuite_source.not_null_stg_netsuite2__jobs_job_id.d7b5b06b33
06:45:47.370281 [debug] [Thread-2  ]: Writing runtime SQL for node "test.netsuite_source.not_null_stg_netsuite2__jobs_job_id.d7b5b06b33"
06:45:47.370726 [debug] [Thread-2  ]: Using redshift connection "test.netsuite_source.not_null_stg_netsuite2__jobs_job_id.d7b5b06b33"
06:45:47.370984 [debug] [Thread-2  ]: On test.netsuite_source.not_null_stg_netsuite2__jobs_job_id.d7b5b06b33: BEGIN
06:45:47.371181 [debug] [Thread-2  ]: Opening a new connection, currently in state closed
06:45:47.371451 [debug] [Thread-2  ]: Redshift adapter: Connecting to Redshift using 'database' credentials
06:45:47.394407 [debug] [Thread-2  ]: SQL status: BEGIN in 0.02 seconds
06:45:47.394730 [debug] [Thread-2  ]: Using redshift connection "test.netsuite_source.not_null_stg_netsuite2__jobs_job_id.d7b5b06b33"
06:45:47.394956 [debug] [Thread-2  ]: On test.netsuite_source.not_null_stg_netsuite2__jobs_job_id.d7b5b06b33: /* {"app": "dbt", "dbt_version": "1.1.0", "profile_name": "fivetran_dbt", "target_name": "prod", "node_id": "test.netsuite_source.n
ot_null_stg_netsuite2__jobs_job_id.d7b5b06b33"} */
select
      count(*) as failures,
      count(*) != 0 as should_warn,
      count(*) != 0 as should_error
    from (

select job_id
from "dkall"."cdp_stg_netsuite"."stg_netsuite2__jobs"
where job_id is null

    ) dbt_internal_test
06:45:47.403544 [debug] [Thread-2  ]: Postgres adapter: Postgres error: relation "cdp_stg_netsuite.stg_netsuite2__jobs" does not exist

06:46:12  Completed with 2 errors and 0 warnings:
06:46:12  
06:46:12  Database Error in test not_null_stg_netsuite2__jobs_job_id (models/stg_netsuite2.yml)
06:46:12    relation "cdp_stg_netsuite.stg_netsuite2__jobs" does not exist
06:46:12    compiled SQL at target/run/netsuite_source/models/stg_netsuite2.yml/not_null_stg_netsuite2__jobs_job_id.sql
06:46:12  
06:46:12  Database Error in test unique_stg_netsuite2__jobs_job_id (models/stg_netsuite2.yml)
06:46:12    relation "cdp_stg_netsuite.stg_netsuite2__jobs" does not exist
06:46:12    compiled SQL at target/run/netsuite_source/models/stg_netsuite2.yml/unique_stg_netsuite2__jobs_job_id.sql
06:46:12  
06:46:12  Done. PASS=164 WARN=0 ERROR=2 SKIP=0 TOTAL=166

Expected behavior

dbt run should run without errors

dbt Project configurations

models: fivetran_dbt: netsuite: +schema: netsuite netsuite_source: +schema: stg_netsuite

vars: netsuite_data_model: netsuite2 netsuite_source: netsuite_database: dkall netsuite_schema: netsuite_suiteanalytics netsuite: netsuite_database: dkall netsuite_schema: netsuite_suiteanalytics

Package versions

packages:

What database are you using dbt with?

redshift

dbt Version

Core:

Plugins:

Additional Context

No response

Are you willing to open a PR to help address this issue?

fivetran-joemarkiewicz commented 2 years ago

Hi @nbenezri thanks for opening this issue. Since this jobs model is not used in downstream transformations, I would recommend disabling the model from your runs. You can do so by adding the following to your dbt_project.yml:

models:
  netsuite_source:
    netsuite2:
      tmp:
        stg_netsuite2__jobs_tmp:
          +enabled: false
      stg_netsuite2__jobs:
          +enabled: false

For a more permanent solution, we should be able to add a disable config to the jobs models. I noticed you were interested in opening a PR. If you would like, I would recommend creating a new variable for the *_jobs models that is able to be disabled and therefore ignore the respective models if users do not have them.

You can see how we already include some logic to disable the jobs model if the user is not leveraging the Netsuite2 models. I think a good solution would be to create a new variable (maybe netsuite2__using_jobs) that a user can disable in their vars config to not run the models. My thought is something like this at the top of the stg_netsuite2__jobs and stg_netsuite2__jobs_tmp models:

{{ config(enabled=var('netsuite_data_model', 'netsuite') == var('netsuite_data_model_override','netsuite2') and var('netsuite2__using_jobs',true)) }}

Let me know if the models solution works for you and if you would be interested in opening a PR. Otherwise, my team and I can implement this in the next patch release.

nbenezri commented 2 years ago

Adding it removes the error! BTW dbt run takes about 40 minutes (schema is about 400gb) - is that normal?

As for the PR, I will take a look. I never looked at your source code but I'm absolutely willing to try!

fivetran-joemarkiewicz commented 2 years ago

That's great to hear!

We have received reports from the previous build of the Netsuite models that sometimes took longer than usual. Do you know if there was one model in particular that seemed to take the longest for you?

nbenezri commented 2 years ago

The majority of the time spent here:

14:51:06  100 of 182 OK created table model cdp_stg_netsuite.stg_netsuite2__transaction_accounting_lines  [SELECT in 391.86s]
14:51:45  101 of 182 OK created table model cdp_stg_netsuite.stg_netsuite2__transaction_lines  [SELECT in 427.82s]
15:07:22  160 of 182 OK created table model cdp_netsuite.netsuite2__income_statement ..... [SELECT in 202.94s]
15:08:30  159 of 182 OK created table model cdp_netsuite.netsuite2__balance_sheet ........ [SELECT in 278.08s]
15:22:06  161 of 182 OK created table model cdp_netsuite.netsuite2__transaction_details .. [SELECT in 884.25s]
nbenezri commented 2 years ago

to add to it, sometimes the SQL for income is too heavy (most of the time it's ok):

00:45:24  Database Error in model netsuite2__income_statement (models/netsuite2/netsuite2__income_statement.sql)
00:45:24    Query (7212024) cancelled by WLM abort action of Query Monitoring Rule "Max_Query_Mem_On_Disk".
00:45:24    DETAIL:  
00:45:24      -----------------------------------------------
00:45:24      error:  Query (7212024) cancelled by WLM abort action of Query Monitoring Rule "Max_Query_Mem_On_Disk".
00:45:24      code:      1078
00:45:24      context:   Query (7212024) cancelled by WLM abort action of Query Monitoring Rule "Max_Query_Mem_On_Disk".
00:45:24      query:     0
00:45:24      location:  wlm_query_action.cpp:156
00:45:24      process:   wlm [pid=836]
00:45:24      -----------------------------------------------
00:45:24    compiled SQL at target/run/netsuite/models/netsuite2/netsuite2__income_statement.sql

I've set Max_Query_Mem_On_Disk to 150gb.

nbenezri commented 2 years ago

Closing as the initial bug for this case was solved. Plus we stopped using it as many tables require full resync.