dbt-labs / dbt-core

dbt enables data analysts and engineers to transform their data using the same practices that software engineers use to build applications.
https://getdbt.com
Apache License 2.0
10k stars 1.63k forks source link

get_manifest_artifacts: Compilation Error in downstream project when upstream project adds versioning to public model referenced in intermediate project #11032

Open taylorterwin opened 3 days ago

taylorterwin commented 3 days ago

Is this a new bug in dbt-core?

Current Behavior

When adding a versioned model to Project A, whereas Project B references that model via cross-project ref, and Project C includes both Project A & Project B dependencies.yml we can confirm that Project C will error once Project A has deployed the version model change to production, but Project B has not yet reran the cross-project ref within Project B's production environment. This causes production runs to fail downstream, without having those dependencies included in the jobs, due to the publication artifact not being updated with the latest node.

We try to inject dependencies among public models in upstream projects, so that downstream projects can see if one of their upstream parents depends on another of their upstream parents so a project that depends on two other projects (e.g. Project A + Project B) could see this error, even if it doesn't depend on either of those models directly - and this error might happen if a public model that was an upstream dependency of another public model, has its access changed (no longer public) or is disabled/deleted/missing.

Expected Behavior

Options:

Steps To Reproduce

  1. Setup dependencies.yml in Project B as:
    `# dependencies.yml
    projects:
    - name: project_a
    • create an xref model select * from {{ ref(project_a, model_x) }}
    • ensure access as public for xref model
  2. Setup dependencies.yml in Project C as:
    `# dependencies.yml
    projects:
    - name: project_a
    - name: project_b`
  3. Ensure project_a model being xref'd access is public and run production build job for successful build.
  4. Run project_b production build job including xref model.
  5. in project_a, change xref'd model_x to specify version
`schema.yml example

version: 2

models:
    - name: my_first_dbt_model
      access: public
      latest_version: 1
      description: "A starter dbt model"
      columns:
          - name: id
            description: "The primary key for this table"
            tests:
                - unique
                - not_null
      versions:
        - v: 1`
  1. Commit, merge, and run project_a production build job to ensure versioned model is picked up.
  2. Try running project_c production build job now and it will fail.

note: Running project_b production build job will update the manifest and grab the latest version/updated node, whereas then project_c production build will successfully complete without error.

Relevant log output

2024-11-20 21:56:03.149660 (MainThread): 21:56:03  Encountered an error:
Runtime Error
  get_manifest_artifacts: Compilation Error
    'model.mesh_test.meshy' depends on 'model.my_new_project.my_first_dbt_model' which is not in the graph!
2024-11-20 21:56:03.150532 (MainThread): 21:56:03  Resource report: {"command_name": "build", "command_wall_clock_time": 2.1111817, "process_user_time": 3.218299, "process_kernel_time": 0.367934, "process_mem_max_rss": "211344", "process_out_blocks": "1976", "command_success": false, "process_in_blocks": "0"}
2024-11-20 21:56:03.151034 (MainThread): 21:56:03  Observability Metric: command_success=0.0
2024-11-20 21:56:03.151577 (MainThread): 21:56:03  Observability Metric: command_wall_clock_time=2.1111817359924316
2024-11-20 21:56:03.152051 (MainThread): 21:56:03  Observability Metric: process_user_time=3.21829891204834
2024-11-20 21:56:03.152459 (MainThread): 21:56:03  Observability Metric: process_kernel_time=0.367933988571167
2024-11-20 21:56:03.152850 (MainThread): 21:56:03  Observability Metric: process_mem_max_rss=211344.0
2024-11-20 21:56:03.153441 (MainThread): 21:56:03  Command `dbt build` failed at 21:56:03.153347 after 2.11 seconds

Environment

- dbt: versionless

Which database adapter are you using with dbt?

No response

Additional Context

No response

schicks commented 2 days ago

I'm missing it in the description; is it necessary for there to be bidirectional dependencies to see the bug, or is it just transitive dependencies? The project graph you describe sounds like this to me;

flowchart LR
A --> B --> C

Where a model has versioning added in A, and the error appears in the build of C (until B has been rebuilt).

taylorterwin commented 2 days ago

@schicks for the purpose of reproduction this was just A>B>C, however in the customer case where this was first reported it's several projects that all depend on each other. So it occurs in both.

jtcohen6 commented 2 days ago

@taylorterwin Thanks for opening!

@schicks Good callout. I don't think this is exclusive to bidirectional dependencies, but that setup does make it trickier to debug and "self-heal." In the A → B → C setup, project C stops seeing this error once project B has a successful run, (after A introduces versioning in the model).