astronomer / astronomer-cosmos

Run your dbt Core projects as Apache Airflow DAGs and Task Groups with a few lines of code
https://astronomer.github.io/astronomer-cosmos/
Apache License 2.0
615 stars 154 forks source link

[Bug] Test task doesn't accept `full_refresh` kwarg #1062

Closed CamposContentful closed 1 week ago

CamposContentful commented 3 months ago

Astronomer Cosmos Version

main (development)

If "Other Astronomer Cosmos version" selected, which one?

No response

dbt-core version

1.8.2

Versions of dbt adapters

dbt-redshift = "^1.8.1"

LoadMode

DBT_LS_MANIFEST

ExecutionMode

AWS_EKS

InvocationMode

None

airflow version

MWAA 2.7.2

Operating System

Fedora

If a you think it's an UI issue, what browsers are you seeing the problem on?

No response

Deployment

Amazon (AWS) MWAA

Deployment details

No response

What happened?

Hello everyone, when passing the full_refresh parameter in operator_args for a DbtTaskGroup task. I get the following error:

airflow.exceptions.AirflowException: Invalid arguments were passed to DbtTestAwsEksOperator (task_id: test). Invalid arguments were:
**kwargs: {'full_refresh': True}

The task is defined as:

    example_calculation = DbtTaskGroup(
        group_id="example_calculation_models",
        project_config=ProjectConfig(
            manifest_path=f"{os.path.dirname(os.path.realpath(__file__))}/manifest.json",
            project_name="example_calculation",
        ),
        profile_config=ProfileConfig(
            profile_name="default_profile", target_name="airflow", profiles_yml_filepath="./dbt_profiles/profiles.yml"
        ),
        render_config=RenderConfig(
            select=["tag:example_calculation"],
            load_method=LoadMode.DBT_MANIFEST,
            node_converters={
                DbtResourceType("source"): convert_source,  # known dbt node type to Cosmos (part of DbtResourceType)
            },
            test_behavior=TestBehavior.AFTER_ALL,
        ),
        execution_config=ExecutionConfig(execution_mode=ExecutionMode.AWS_EKS, dbt_project_path="/usr/app/dbt/"),
        operator_args={
            "in_cluster": False,
            "image_pull_policy": "Always",
            "get_logs": True,
            "is_delete_operator_pod": True,
            "on_finish_action": "delete_pod",
            "full_refresh": True,
        },
    )

My understanding is that the full_refresh argument should be ignored by dbt test. I couldn't find any similar issues in the repo, do you an idea if I'm just setting the argument wrong or is something else.

Thanks!

Relevant log output

No response

How to reproduce

For execution_mode=ExecutionMode.AWS_EKS set up a dbt project with test and full_refresh option in operator_args.

Anything else :)?

No response

Are you willing to submit PR?

Contact Details

jaime.campos@contentful.com

dosubot[bot] commented 3 months ago

Hey @CamposContentful! 👋 I'm here to help you with your bug report. I'm a bot that can assist with bug fixes, answering questions, and even help you become a contributor. Let's tackle this together!

The DbtTestAwsEksOperator does not support the full_refresh argument. The full_refresh argument is not applicable to the dbt test command, and thus, it is not included in the DbtTestAwsEksOperator's initialization or command flags [1].

To resolve this issue, you should remove the full_refresh argument from the operator_args when using the DbtTestAwsEksOperator. Here is the corrected task definition:

example_calculation = DbtTaskGroup(
    group_id="example_calculation_models",
    project_config=ProjectConfig(
        manifest_path=f"{os.path.dirname(os.path.realpath(__file__))}/manifest.json",
        project_name="example_calculation",
    ),
    profile_config=ProfileConfig(
        profile_name="default_profile", target_name="airflow", profiles_yml_filepath="./dbt_profiles/profiles.yml"
    ),
    render_config=RenderConfig(
        select=["tag:example_calculation"],
        load_method=LoadMode.DBT_MANIFEST,
        node_converters={
            DbtResourceType("source"): convert_source,  # known dbt node type to Cosmos (part of DbtResourceType)
        },
        test_behavior=TestBehavior.AFTER_ALL,
    ),
    execution_config=ExecutionConfig(execution_mode=ExecutionMode.AWS_EKS, dbt_project_path="/usr/app/dbt/"),
    operator_args={
        "in_cluster": False,
        "image_pull_policy": "Always",
        "get_logs": True,
        "is_delete_operator_pod": True,
        "on_finish_action": "delete_pod",
        # Remove "full_refresh" argument
    },
)

By removing the full_refresh argument, the DbtTestAwsEksOperator should no longer raise an AirflowException for invalid arguments.

To continue talking to Dosu, mention @dosu.

tatiana commented 1 week ago

Hi @CamposContentful , thank you for reporting and submitting a fix for this! Based on the dbt (latest == 1.8) docs, https://docs.getdbt.com/reference/resource-configs/full_refresh, my understanding is also that full-refresh is only a parameter for dbt run and dbt seed.