apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
37.12k stars 14.31k forks source link

ExternalPythonOperator Jinja template rendering exception when op_kwargs dictionary value contain special characters #39584

Closed samodelkinas closed 1 month ago

samodelkinas commented 6 months ago

Apache Airflow version

2.9.1

If "Other Airflow 2 version" selected, which one?

No response

What happened?

Airflow DAG throws an exception while calling ExternalPythonOperator with op_kwars dictionary containing value with certain { and % character combinations: [2024-05-13 03:50:17,424] {abstractoperator.py:708} ERROR - Exception rendering Jinja template for task 'redacted', field 'op_kwargs'. Template: {redacted} Traceback (most recent call last): File "/opt/airflow/venv/airflow/lib64/python3.11/site-packages/airflow/models/abstractoperator.py", line 700, in _do_render_template_fields rendered_content = self.render_template( ^^^^^^^^^^^^^^^^^^^^^ File "/data/user/airflow/venv/airflow/lib64/python3.11/site-packages/airflow/template/templater.py", line 186, in render_template return {k: self.render_template(v, context, jinja_env, oids) for k, v in value.items()} ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/airflow/venv/airflow/lib64/python3.11/site-packages/airflow/template/templater.py", line 186, in return {k: self.render_template(v, context, jinja_env, oids) for k, v in value.items()} ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/airflow/venv/airflow/lib64/python3.11/site-packages/airflow/template/templater.py", line 173, in render_template template = jinja_env.from_string(value) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/airflow/venv/airflow/lib64/python3.11/site-packages/jinja2/environment.py", line 1105, in from_string return cls.from_code(self, self.compile(source), gs, None) ^^^^^^^^^^^^^^^^^^^^ File "/opt/airflow/venv/airflow/lib64/python3.11/site-packages/jinja2/environment.py", line 768, in compile self.handle_exception(source=source_hint) File "/opt/airflow/venv/airflow/lib64/python3.11/site-packages/jinja2/environment.py", line 936, in handle_exception raise rewrite_traceback_stack(source=source) File "", line 1, in template jinja2.exceptions.TemplateSyntaxError: tag name expected

What you think should happen instead?

Ideally, all op_kwargs dictionary values should be sent to operators as base64 encoded string and decoded by an operator to avoid templating errors for entries like passwords containing special characters. As a workaround, encoding was done by user and decoded in callable method

How to reproduce

Create a DAG using ExternalPythonOperator and provide '{%blah' or similar string used by jinja as escape charaters as the value for one of op_kwargs. Running the DAG will throw Jinja rendering exception.

Operating System

Linux RedHat 8

Versions of Apache Airflow Providers

apache-airflow-providers-common-sql==1.10.0 apache-airflow-providers-ftp==3.7.0 apache-airflow-providers-http==4.8.0 apache-airflow-providers-imap==3.5.0 apache-airflow-providers-sqlite==3.7.0

Deployment

Virtualenv installation

Deployment details

No response

Anything else?

No response

Are you willing to submit PR?

Code of Conduct

boring-cyborg[bot] commented 6 months ago

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

rawwar commented 6 months ago

Still trying to understand the exact root issue. Hence hiding my comment till I have something concrete to discuss

rawwar commented 6 months ago

I looked into this, and the following is my analysis.

This error will happen whenever we are parsing values using jinja_env.from_string(value) . This happens when the string is not a valid jinja template variable.

In this case, {* is a statement delimiter. And, jinja_env.from_string will try to parse it and fail. Hence, the string should adhere to Jinja's syntax rules and not contain unrecognized or improperly formatted tags

rawwar commented 6 months ago

@potiuk , I think the solution for this is not a code fix. Rather, update the documentation to inform users that the values passed to op_args and op_kwargs should follow jinja syntax rules. If they want to have a string with characters that jinja uses, they can do one of the following as mentioned in the jinja docs

  1. Use Escaping Delimiters
  2. Use raw blocks
potiuk commented 6 months ago

@potiuk , I think the solution for this is not a code fix. Rather, update the documentation to inform users that the values passed to op_args and op_kwargs should follow jinja syntax rules. If they want to have a string with characters that jinja uses, they can do one of the following as mentioned in the jinja docs

  1. Use Escaping Delimiters
  2. Use raw blocks

Yes. Feel free to update the docs

rawwar commented 1 month ago

@potiuk, I completely forgot about this issue and just started working on it. But, I feel this is definitely not a bug, nor require mention of whether to use Jinja templating specifically for this Operator. PthonOperator has template_fields set to ("templates_dict", "op_args", "op_kwargs") https://github.com/apache/airflow/blob/f1664674d859a262e93fb3110557a1e71138ca8b/airflow/operators/python.py#L193

So, any operator extending PythonOperator expects template_fields to follow Jinja syntax rules. In the documentation, we do not explicitly mention the list of template fields for any operator.

A thought: we just mentioned that templating is possible. L ke here

Is it worth adding all possible template fields to all Operator documentation?

Just realised that we just mention templates_dict is the only field templated. I'll update this documentation. I also noticed, BashOperator does not mention about what fields are templated. I'll also update that