dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
10.03k stars 1.64k forks source link

[CT-3236] [Bug] When adding a new version of model `foo` - partial parsing runs into a Compilation error 'model.my_dbt_project.bar' depends on 'model.my_dbt_project.foo' which is not in the graph! #8872

Open jeremyyeo opened 1 year ago

jeremyyeo commented 1 year ago

Is this a new bug in dbt-core?

Current Behavior

Looks similar to #8859

When adding a new model version (i.e.foo_v2.sql) - partial parsing appears to not be able to find the previous version of the model even though the foo.sql file exist. Things work correctly when doing a full parse (i.e. dbt clean / delete target/ folder first).

Expected Behavior

Partial parsing should be able to detect the previous model file.

Steps To Reproduce

  1. Project setup.
# dbt_project.yml
name: my_dbt_project
profile: all
config-version: 2
version: "1.0.0"

models:
  my_dbt_project:
    +materialized: table

# models/schema.yml
version: 2
models:
  - name: bar
  - name: foo
-- models/bar.sql
select * from {{ ref('foo') }}

-- models/foo.sql
select 1 as id
  1. Build project to create initial target/partial_parse.msgpack file:
$ ls target

ls: target: No such file or directory

$ dbt clean && dbt build

21:27:02  Running with dbt=1.6.6
21:27:02  Checking target/*
21:27:02  Cleaned target/*
21:27:02  Finished cleaning all paths.
21:27:07  Running with dbt=1.6.6
21:27:07  Registered adapter: postgres=1.6.6
21:27:07  Unable to do partial parsing because saved manifest not found. Starting full parse.
21:27:08  Found 2 models, 0 sources, 0 exposures, 0 metrics, 352 macros, 0 groups, 0 semantic models
21:27:08  
21:27:08  Concurrency: 1 threads (target='pg-local')
21:27:08  
21:27:08  1 of 2 START sql table model public.foo ........................................ [RUN]
21:27:08  1 of 2 OK created sql table model public.foo ................................... [SELECT 1 in 0.17s]
21:27:08  2 of 2 START sql table model public.bar ........................................ [RUN]
21:27:08  2 of 2 OK created sql table model public.bar ................................... [SELECT 1 in 0.07s]
21:27:08  
21:27:08  Finished running 2 table models in 0 hours 0 minutes and 0.47 seconds (0.47s).
21:27:08  
21:27:08  Completed successfully
21:27:08  
21:27:08  Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
  1. Add a new version of foo (new .sql file + changes to schema yml file):
-- models/bar.sql
select * from {{ ref('foo') }}

-- models/foo.sql
select 1 as id

-- models/foo_v2.sql
select 1 as id
# models/schema.yml
version: 2
models:
  - name: bar
  - name: foo
    latest_version: 1
    versions:
      - v: 1
      - v: 2
$ ls target
compiled               graph.gpickle          graph_summary.json     manifest.json          partial_parse.msgpack  run                    run_results.json       semantic_manifest.json

$ dbt build
21:36:51  Running with dbt=1.6.6
21:36:51  Registered adapter: postgres=1.6.6
21:36:51  Encountered an error:
Compilation Error
  'model.my_dbt_project.bar' depends on 'model.my_dbt_project.foo' which is not in the graph!

$ dbt clean && dbt build
21:37:37  Running with dbt=1.6.6
21:37:37  Checking target/*
21:37:37  Cleaned target/*
21:37:37  Finished cleaning all paths.
21:37:41  Running with dbt=1.6.6
21:37:41  Registered adapter: postgres=1.6.6
21:37:41  Unable to do partial parsing because saved manifest not found. Starting full parse.
21:37:42  Found 3 models, 0 sources, 0 exposures, 0 metrics, 352 macros, 0 groups, 0 semantic models
21:37:42  
21:37:42  Concurrency: 1 threads (target='pg-local')
21:37:42  
21:37:42  1 of 3 START sql table model public.foo_v1 ..................................... [RUN]
21:37:42  1 of 3 OK created sql table model public.foo_v1 ................................ [SELECT 1 in 0.14s]
21:37:42  2 of 3 START sql table model public.foo_v2 ..................................... [RUN]
21:37:42  2 of 3 OK created sql table model public.foo_v2 ................................ [SELECT 1 in 0.06s]
21:37:42  3 of 3 START sql table model public.bar ........................................ [RUN]
21:37:42  While compiling 'bar':
Found an unpinned reference to versioned model 'foo' in project 'my_dbt_project'.
Resolving to latest version: foo.v1
A prerelease version 2 is available. It has not yet been marked 'latest' by its maintainer.
When that happens, this reference will resolve to foo.v2 instead.

  Try out v2: {{ ref('my_dbt_project', 'foo', v='2') }}
  Pin to  v1: {{ ref('my_dbt_project', 'foo', v='1') }}

21:37:43  3 of 3 OK created sql table model public.bar ................................... [SELECT 1 in 0.07s]
21:37:43  
21:37:43  Finished running 3 table models in 0 hours 0 minutes and 0.47 seconds (0.47s).
21:37:43  
21:37:43  Completed successfully
21:37:43  
21:37:43  Done. PASS=3 WARN=0 ERROR=0 SKIP=0 TOTAL=3

Relevant log output

10:36:51.372585 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'start', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10d7e5880>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x110412e50>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x11044f1c0>]}

============================== 10:36:51.381508 | 8427ce64-1254-4331-a412-042bf9e9e24f ==============================
10:36:51.381508 [info ] [MainThread]: Running with dbt=1.6.6
10:36:51.382438 [debug] [MainThread]: running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager', 'log_cache_events': 'False', 'write_json': 'True', 'partial_parse': 'True', 'cache_selected_only': 'False', 'profiles_dir': '/Users/jeremy/.dbt', 'version_check': 'True', 'debug': 'False', 'log_path': '/Users/jeremy/src/dbt-basic/logs', 'fail_fast': 'False', 'warn_error': 'None', 'use_colors': 'True', 'use_experimental_parser': 'False', 'no_print': 'None', 'quiet': 'False', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'static_parser': 'True', 'introspect': 'True', 'log_format': 'default', 'target_path': 'None', 'invocation_command': 'dbt build', 'send_anonymous_usage_stats': 'True'}
10:36:51.513586 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'project_id', 'label': '8427ce64-1254-4331-a412-042bf9e9e24f', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x110412340>]}
10:36:51.526609 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'adapter_info', 'label': '8427ce64-1254-4331-a412-042bf9e9e24f', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x110753c40>]}
10:36:51.527728 [info ] [MainThread]: Registered adapter: postgres=1.6.6
10:36:51.550994 [debug] [MainThread]: checksum: 546b81fb56652c304d87abd676e84d4737d8a0c6b62160f4a6e79dcddbc842bb, vars: {}, profile: , target: , version: 1.6.6
10:36:51.590249 [debug] [MainThread]: Partial parsing enabled: 0 files deleted, 1 files added, 1 files changed.
10:36:51.591157 [debug] [MainThread]: Partial parsing: added file: my_dbt_project://models/foo_v2.sql
10:36:51.592056 [debug] [MainThread]: Partial parsing: updated file: my_dbt_project://models/schema.yml
10:36:51.686018 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'load_project', 'label': '8427ce64-1254-4331-a412-042bf9e9e24f', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1109e2220>]}
10:36:51.699615 [error] [MainThread]: Encountered an error:
Compilation Error
  'model.my_dbt_project.bar' depends on 'model.my_dbt_project.foo' which is not in the graph!
10:36:51.701016 [debug] [MainThread]: Command `dbt build` failed at 10:36:51.700731 after 0.36 seconds
10:36:51.702035 [debug] [MainThread]: Sending event: {'category': 'dbt', 'action': 'invocation', 'label': 'end', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x10d7e5880>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1108e8100>, <snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x1108e8250>]}
10:36:51.702804 [debug] [MainThread]: Flushing usage events

Environment

- OS: macOS
- Python: Python 3.9.13
- dbt:

Core:
  - installed: 1.6.6
  - latest:    1.6.6 - Up to date!

Plugins:
  - databricks: 1.6.4 - Update available!
  - bigquery:   1.6.7 - Up to date!
  - snowflake:  1.6.4 - Up to date!
  - postgres:   1.6.6 - Up to date!
  - spark:      1.6.0 - Up to date!

  At least one plugin is out of date or incompatible with dbt-core.
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Which database adapter are you using with dbt?

postgres

Additional Context

No response

dbeatty10 commented 1 year ago

Thanks for reporting this @jeremyyeo! And thank you for the nice reprex above 🤩

Indeed, this does look similar to https://github.com/dbt-labs/dbt-core/issues/8859.

I'm going to leave this as a stand-alone issue since your example looks unique since it is triggered by a new model version (rather than a property change like in #8859).

We may consolidate these into the same issue in the future though.

dbeatty10 commented 1 year ago

We should check if this is resolved by https://github.com/dbt-labs/dbt-core/pull/8865

karenderer commented 7 months ago

@dbeatty10 This appears not to be resolved. I followed the repro steps above on dbt 1.7.13 and I'm still seeing the same error.

$dbt run -m foo bar
20:00:16  Running with dbt=1.7.13
20:00:17  Registered adapter: postgres=1.7.13
20:00:17  Encountered an error:
Compilation Error
  'model.my_dbt_project.bar' depends on 'model.my_dbt_project.foo' which is not in the graph!
dbeatty10 commented 7 months ago

Thanks for checking this and sharing the result @karenderer ! 🏆

Workaround

There's a handful options for workarounds in the meantime -- all of which should only need to be done a single time.

  1. Disable partial parsing for a single build
dbt build --no-partial-parse
  1. Clean out the target folder with the dedicated dbt clean command
dbt clean && dbt build
  1. Manually delete the entire target folder that contains the partial_parse.msgpack file
rm -rf target
  1. Manually delete just the partial_parse.msgpack file within the target folder
rm target/partial_parse.msgpack
karenderer commented 7 months ago

Thank you! Disabling partial parsing seems to do the trick for now - appreciate the fast reply!

pmartincalvo commented 2 months ago

Just wanted to chime in to say that my team is currently encountering this behaviour.

The proposed workarounds do work and implementing them in our production executions has been trivial (so many thanks about that, you saved my friday afternoon), but it's been quite unpleasant having to communicate to everyone in our dbt project about them, and will continue to be.

Looking forward to a fix that will make us free from having to remind everyone that they need to clean their target everytime they run anything, just in case some other colleague has introduced a new version.