Closed vatsrahul1001 closed 3 months ago
Looks like its related to https://github.com/apache/airflow/pull/40471
cc @boraberke
Hi @vatsrahul1001,
job_id
field of DatabricksRunNowOperator
is not a templated field which might be the cause of the issue. Before #40471 constructor added job_id
inside the json
parameter before rendering the templated field which supported job_id as a template field.
A workaround for this could be setting json parameter as
json={"job_id": "{{ task_instance.xcom_pull(task_ids='submit_run', dag_id='example_async_databricks', key='job_id') }}"},
instead of specifying explicit job_id
parameter:
job_id="{{ task_instance.xcom_pull(task_ids='submit_run', dag_id='example_async_databricks', key='job_id') }}",
I do not have test environment for Databricks to test if my assumption works. Let me know if this fixes the problem.
Previously, before 6.7.0, even though the named parameters were not templated, they were placed in a templated field named json in the init function. When execute is called, the template field json is resolved.
In 6.7.0, the change made it so that the named parameters are saved to a non-templated field overridden_json_params to be later used in the execute function in via calling _setup_and_validate_json. This means that named parameters that would have been templated are no longer resolved.
@boraberke Was this an intended change?
@boraberke
Hi @vatsrahul1001,
job_id
field ofDatabricksRunNowOperator
is not a templated field which might be the cause of the issue. Before #40471 constructor addedjob_id
inside thejson
parameter before rendering the templated field which supported job_id as a template field.A workaround for this could be setting json parameter as
json={"job_id": "{{ task_instance.xcom_pull(task_ids='submit_run', dag_id='example_async_databricks', key='job_id') }}"},
instead of specifying explicit
job_id
parameter:job_id="{{ task_instance.xcom_pull(task_ids='submit_run', dag_id='example_async_databricks', key='job_id') }}",
I do not have test environment for Databricks to test if my assumption works. Let me know if this fixes the problem.
I tried using templating json param as expected, however, existing example DAG using named param with templating should not break by this new change. As mentioned here using only named params instead of json is very common in use.
Previously, before 6.7.0, even though the named parameters were not templated, they were placed in a templated field named json in the init function. When execute is called, the template field json is resolved.
In 6.7.0, the change made it so that the named parameters are saved to a non-templated field overridden_json_params to be later used in the execute function in via calling _setup_and_validate_json. This means that named parameters that would have been templated are no longer resolved.
@boraberke Was this an intended change?
json
parameter to be templated as it should be. However, as you stated, named parameters that were implicitly templated (i.e. not in the template_fields
but merged with json
) are no longer resolved correctly.This affected all of the below operators:
I tried using templating json param as expected, however, existing example DAG using named param with templating should not break by this new change. As mentioned here using only named params instead of json is very common in use.
I agree @vatsrahul1001, apparently docs mentioned some of the params, including job_id
as templated here but I did not see them before.
Adding necessary named params into template_fields
may be a way to fix it. WDYT @wolfier @vatsrahul1001?
Additionally, @potiuk should we revert #40471 or add a new commit that fixes this issue?
@boraberke As per documentation Template substitution occurs just before the pre_execute function of your operator is called.
I don't think so adding named params into template_fields will resolve this
Additionally, @potiuk should we revert https://github.com/apache/airflow/pull/40471 or add a new commit that fixes this issue?
Fix will be best
Hello, these changes also broke my code where im using jinja templating in the notebook_params for DatabricksRunNowOperator
@boraberke are you working on a fix?
Apache Airflow version
main (development)
If "Other Airflow 2 version" selected, which one?
No response
What happened?
DatabricksRunNowOperator started failing after upgrading to
6.7.0
version with the below errorI have verified it works well with 6.6.0 version
What you think should happen instead?
No response
How to reproduce
import json import os from datetime import timedelta from typing import Dict, Optional
from airflow.models.dag import DAG from airflow.utils.timezone import datetime
from airflow.providers.databricks.operators.databricks import ( DatabricksRunNowOperator, DatabricksSubmitRunOperator, )
Operating System
Linux
Versions of Apache Airflow Providers
databricks 6.7.0
Deployment
Astronomer
Deployment details
No response
Anything else?
No response
Are you willing to submit PR?
Code of Conduct