dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.63k stars 1.47k forks source link

KeyError when loading dbt semantic models as Dagster assets #17471

Closed bolinzzz closed 8 months ago

bolinzzz commented 1 year ago

Dagster version

1.5.5

What's the issue?

I am trying to load dbt models (dbt-core 1.6.6) as Dagster assets, specifically - fivetran/ad_reporting.

I encountered KeyError: 'semantic_model.ad_reporting.ad_report, also see the attached screenshot. Screenshot 2023-10-30 at 10 29 00 PM

What did you expect to happen?

The code location should be loaded. The dbt models should be loaded as Dagster assets.

How to reproduce?

You may reproduce the error with the jaffle_shop project, follow this official dbt + Dagster tutorial. Simply add this dbt dependency to package.yml file.

packages:
  - package: fivetran/ad_reporting
    version: [">=1.7.0", "<1.8.0"]

Deployment type

Local

Deployment details

No response

Additional information

No response

Message from the maintainers

Impacted by this issue? Give it a ๐Ÿ‘! We factor engagement into prioritization.

tacastillo commented 1 year ago

Adding a link to a ๐Ÿงต where a couple other folks have experienced this too https://dagster.slack.com/archives/C04CW71AGBW/p1692125606193579

andreqaugusto commented 11 months ago

any news on this bug?

ShahBinoy commented 11 months ago

This is a very critical issue for us to make enhancements to our platform.

bolinzzz commented 11 months ago

I am doing this as a workaround. Are there any concerns that you all observe with this approach?

import json

from dagster import AssetExecutionContext, Config
from dagster_dbt import DbtCliResource, dbt_assets
from dagster_dbt.dagster_dbt_translator import DagsterDbtTranslator, DagsterDbtTranslatorSettings

from .constants import dbt_manifest_path, is_production_environment

class DbtConfig(Config):
    full_refresh: bool = False

translator = DagsterDbtTranslator(settings=DagsterDbtTranslatorSettings(
    enable_asset_checks=True))

# manifest.json file has the following keys: ["metadata", "nodes", "sources", "macros", "docs", "exposures",
# "metrics", "groups", "selectors", "disabled", "parent_map", "child_map", "group_map", "semantic_models"]. When
# calling `select_unique_ids_from_manifest()`, dagster is looking at manifest_json["child_map"] to get all the keys,
# for each key, get the node information from manifest_json["nodes"]. The key "semantic_model.ad_reporting.ad_report"
# exists in manifest_json["child_map"] but not manifest_json["nodes"], leading to KeyError when Dagster attempts to
# load all dbt models from manifest.json as Dagster assets. This function below is removing
# "semantic_model.ad_reporting.ad_report"
with open(dbt_manifest_path, "r") as file:
    data = json.load(file)

del data["child_map"]["semantic_model.ad_reporting.ad_report"]
data["child_map"]["model.ad_reporting.ad_reporting__ad_report"].remove(
    "semantic_model.ad_reporting.ad_report")
# Write the modified data to a new JSON file
modified_dbt_manifest_path = f"{str(dbt_manifest_path)[:-5]}_modified.json"
with open(modified_dbt_manifest_path, "w") as output_file:
    json.dump(data, output_file, indent=2)

@dbt_assets(manifest=modified_dbt_manifest_path,
            dagster_dbt_translator=translator)
def bi_data_models_dbt_assets(context: AssetExecutionContext,
                              dbt: DbtCliResource, config: DbtConfig):
    dbt_build_args = ["build", "--indirect-selection", "cautious"]
    if is_production_environment:
        dbt_build_args += ["--target", "no_prefix"]
    if config.full_refresh:
        dbt_build_args += ["--full-refresh"]
    yield from dbt.cli(dbt_build_args, context=context).stream()