dagster-io / dagster

An orchestration platform for the development, production, and observation of data assets.
https://dagster.io
Apache License 2.0
11.14k stars 1.4k forks source link

DBT project path is not found #9583

Closed fannywiryana closed 11 months ago

fannywiryana commented 2 years ago

Dagster version

0.15.8

What's the issue?

DBT project path is not found when I deploy the repo, but it worked in my local. Hence, I don't know how to reproduce the issue. This is the error message.

Error loading dbt_repository.py. Try reloading the repository location after resolving the issue.
FileNotFoundError: [Errno 2] No such file or directory: 'dbt'
  File "/home/ubuntu/.local/lib/python3.8/site-packages/dagster/_grpc/server.py", line 227, in __init__
    self._loaded_repositories = LoadedRepositories(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/dagster/_grpc/server.py", line 101, in __init__
    loadable_targets = get_loadable_targets(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/dagster/_grpc/utils.py", line 33, in get_loadable_targets
    else loadable_targets_from_python_file(python_file, working_directory)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/dagster/_core/workspace/autodiscovery.py", line 26, in loadable_targets_from_python_file
    loaded_module = load_python_file(python_file, working_directory)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/dagster/_core/code_pointer.py", line 86, in load_python_file
    return import_module_from_path(module_name, python_file)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/dagster/_seven/__init__.py", line 51, in import_module_from_path
    spec.loader.exec_module(module)
  File "<frozen importlib._bootstrap_external>", line 848, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/opt/dagster/repo_sync/data_engineering/dbt_repository.py", line 22, in <module>
    dbt_assets = load_assets_from_dbt_project(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/dagster_dbt/asset_defs.py", line 424, in load_assets_from_dbt_project
    manifest_json, cli_output = _load_manifest_for_project(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/dagster_dbt/asset_defs.py", line 41, in _load_manifest_for_project
    cli_output = execute_cli(
  File "/home/ubuntu/.local/lib/python3.8/site-packages/dagster_dbt/cli/utils.py", line 102, in execute_cli
    process = subprocess.Popen(
  File "/usr/lib/python3.8/subprocess.py", line 858, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/usr/lib/python3.8/subprocess.py", line 1704, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)

I have this in my code

ROOT_DIR = "/opt/dagster/repo_sync"
DBT_PROJECT_DIR = f"{ROOT_DIR}/dbt_transform"
DBT_PROFILES_DIR = f"{ROOT_DIR}/dbt_transform/config"

dbt_assets = load_assets_from_dbt_project(
    DBT_PROJECT_DIR,
    DBT_PROFILES_DIR,
)

@repository
def transformation_layer():
    return with_resources(
        dbt_assets,
        resource_defs={
            "dbt": dbt_cli_resource.configured(
                {"project_dir": DBT_PROJECT_DIR, "profiles_dir": DBT_PROFILES_DIR}
            ),
        },
    )

What did you expect to happen?

dbt_repository can be loaded well

How to reproduce?

No response

Deployment type

Other

Deployment details

we build it into GCP cloud compute engine by running some sh script.

Additional information

this is installed in requirements.txt file

dagit==0.15.8 dagster==0.15.8 dagster-airbyte==0.15.8 dagster-cron==0.11.16 dagster-dbt==0.15.8 dagster-graphql==0.15.8 dagster-pandas==0.15.8 dagster-postgres==0.15.8 dagster-slack==0.15.8 db-dtypes==1.0.2 dbt-bigquery==1.2.0 dbt-core==1.2.0 dbt-extractor==0.4.1 dbt-postgres==1.2.0

Message from the maintainers

Impacted by this issue? Give it a 👍! We factor engagement into prioritization.

yuhan commented 2 years ago

Conversation continued in Slack: https://dagster.slack.com/archives/C01U954MEER/p1662587709737289?thread_ts=1662463253.943799&cid=C01U954MEER

MeganBeckett commented 1 year ago

Hi, I am also getting this error and can't figure it out as it works locally when running dagit but not when I deploy to Dagster Cloud.

I can't open up the Slack message - is there any follow up or resolution to this?

Bonnevie commented 1 year ago

@yuhan don't know if this helps, but was running a multi-repo local setup with a workspace.yaml in one repo pointing to another repo with dbt assets, and I got this error. It seems that the dbt command is expected to be present in the virtual environment where dagit is running, and not the one with the dbt assets. This is not a problem for me, but I could imagine this could be an issue if you need to manage multiple repos with different dbt versions?

edit: to be clear, my fix was to install dbt in the project with the workspace.yaml.

hypr-platform commented 1 year ago

@Bonnevie can you provide a workspace.yaml example with the dbt project inserted?

chumbert2 commented 1 year ago

Hello,

I have observed the same issue (dagster 1.3.14) when trying to run the assets_dbt_python example on an EC2 instance.

Several code locations are defined in a workspace.yml file. dagster-webserver and dagster-daemon are run as systemd units.

/lib/systemd/system/dagster-webserver.service:

[Unit]
Description=Dagster Webserver (Dagster Web UI)
After=network.target

[Service]
Type=simple
User=dagster
EnvironmentFile=/etc/sysconfig/dagster
ExecStart=/opt/dagster/venv/bin/dagster-webserver -h 0.0.0.0 -p 3000 -w /opt/dagster/app/workspace.yml
Restart=always
WorkingDirectory=/opt/dagster/app/

[Install]
WantedBy=multi-user.target

/lib/systemd/system/dagster-daemon.service:

[Unit]
Description=dagster-daemon (handle Dagster schedules, sensors and run queueing)
After=network.target

[Service]
Type=simple
User=dagster
EnvironmentFile=/etc/sysconfig/dagster
ExecStart=/opt/dagster/venv/bin/dagster-daemon run -w /opt/dagster/app/workspace.yml

Restart=always
WorkingDirectory=/opt/dagster/app/

[Install]
WantedBy=multi-user.target

My workspace.yml is as follows:

# /opt/dagster/app/workspace.yml
load_from:
  - python_file:
      relative_path: assets_s3_sensor.py
  - python_file:
      relative_path: assets_schedule.py
  - python_module:
      module_name: xyz_demo
      working_directory: xyz-demo
  - python_module:
      module_name: assets_dbt_python       # as in dagster Git repo examples/assets_dbt_python
      working_directory: assets_dbt_python
      executable_path: assets_dbt_python/venv/bin/python

My code locations are in /opt/dagster/app/:

# tree  -L 2 /opt/dagster/app/
/opt/dagster/app/
|-- __pycache__
|-- assets_dbt_python
|   |-- README.md
|   |-- assets_dbt_python
|   |-- assets_dbt_python_tests
|   |-- dagster_cloud.yaml
|   |-- dbt_project
|   |-- pyproject.toml
|   |-- requirements.in
|   |-- requirements.txt
|   |-- setup.cfg
|   |-- setup.py
|   |-- tox.ini
|   `-- venv             # <-- dbt is installed in this venv!
|-- assets_s3_sensor.py
|-- assets_schedule.py
|-- xyz-demo
|   |-- xyz_demo
|   |-- xyz_demo_tests
|   |-- pyproject.toml
|   |-- setup.cfg
|   `-- setup.py
`-- workspace.yml

For assets_dbt_python, I expected that the dbt from /opt/dagster/app/assets_dbt_python/venv/bin/ can be used but this is not the case.

As explained by @Bonnevie, a workaround is to add dbt command in the PATH of dagster-webserver but this is a little bit misleading and may cause issues if different dbt versions are present in other code locations.

yuhan commented 1 year ago

cc @rexledesma getting this on your radar.

rexledesma commented 11 months ago

Resolved by https://github.com/dagster-io/dagster/pull/17171, if the user wants to completely configure the path to their dbt executable.