astronomer / astronomer-cosmos

Run your dbt Core projects as Apache Airflow DAGs and Task Groups with a few lines of code
https://astronomer.github.io/astronomer-cosmos/
Apache License 2.0
782 stars 172 forks source link

[Bug] Docker Execution doesn't work with Task Groups #1346

Open andrewhlui opened 5 days ago

andrewhlui commented 5 days ago

Astronomer Cosmos Version

1.7

dbt-core version

1.8

Versions of dbt adapters

No response

LoadMode

AUTOMATIC

ExecutionMode

DOCKER

InvocationMode

None

airflow version

2.10

Operating System

Similar to #493 but for Docker.

macOS 14.6.1 (23G93)

If a you think it's an UI issue, what browsers are you seeing the problem on?

No response

Deployment

Docker-Compose

Deployment details

No response

What happened?

Docker operators don't accept profile_config in kwaargs. This is an issue when trying to use Docker execution mode with DbtTaskGroup, which automatically includes it.

Error:

Invalid arguments were passed to DbtRunDockerOperator Invalid arguments were: **kwargs: {'profile_config': None}

Relevant log output

Invalid arguments were passed to DbtRunDockerOperator Invalid arguments were: **kwargs: {'profile_config': None}

How to reproduce

from datetime import datetime

from airflow.models import DAG
from airflow.operators.empty import EmptyOperator
from cosmos import DbtTaskGroup
from cosmos import ExecutionConfig
from cosmos import ExecutionMode
from cosmos import ProfileConfig
from cosmos import ProjectConfig
from cosmos.profiles import RedshiftUserPasswordProfileMapping

DBT_IMAGE = "[image]:latest"
MANIFEST_PATH = "[path]/manifest.json"
PROJECT_DIR = "dbt"

profile_config = ProfileConfig(
    profile_name="default",
    target_name="dev",
    profile_mapping=RedshiftUserPasswordProfileMapping(
        conn_id="airflow_db",
        profile_args={"schema": "public"},
    ),
)

project_config = ProjectConfig(
    manifest_path=MANIFEST_PATH,
    project_name="dbt",
)
execution_config = ExecutionConfig(execution_mode=ExecutionMode.DOCKER)

operator_args = {"image": DBT_IMAGE, "conn_id": "airflow_db"}

with DAG(
    dag_id="cosmos_docker",
    start_date=datetime(2024, 1, 1),
    schedule=None,
) as dag:
    tg = DbtTaskGroup(
        group_id="task_group",
        project_config=project_config,
        execution_config=execution_config,
        profile_config=profile_config,
        operator_args=operator_args,
        default_args={"retries": 2},
    )

    tg

Anything else :)?

No response

Are you willing to submit PR?

Contact Details

No response

dosubot[bot] commented 5 days ago

Hey @andrewhlui! I'm here to help you with your Docker Execution issue. I'm a bot that can assist with bugs, questions, and becoming a contributor. Let's get this sorted out together!

A similar issue was previously reported and resolved in the Cosmos repository. The issue was related to Kubernetes execution mode and was closed by a pull request that fixed the problem [1]. However, the specific details of the changes made in that pull request are not available in the provided context.

For the current issue with Docker execution mode, the DbtRunDockerOperator does not accept the profile_config argument in its keyword arguments. This is because the DbtRunDockerOperator inherits from DbtDockerBaseOperator, which in turn inherits from DockerOperator. The constructor of DbtRunDockerOperator is designed to pass any additional keyword arguments to its superclass, but profile_config is not a recognized argument for these operators [2].

To work around this issue, you might consider modifying the DbtTaskGroup or the way operator_args are constructed to exclude profile_config when using Docker execution mode. Alternatively, you could explore customizing the DbtRunDockerOperator to handle profile_config appropriately if you are willing to submit a pull request.

To continue talking to Dosu, mention @dosu.