apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
37.33k stars 14.34k forks source link

param object do not accept array #43433

Open raphaelauv opened 4 weeks ago

raphaelauv commented 4 weeks ago

Apache Airflow version

2.10.2

What happened?

I can't trigger a dag with [ ] or [{"toto": 5}] ( that are valid JSON )


with DAG(
        dag_id="a",
        params={
            "x": Param(None, type="object"),
        })

it log

Invalid input for param batch: [{'toto': 5}] is not of type 'object' Failed validating 'type' in schema: {'type': 'object'} On instance: [{'toto': 5}]

Are you willing to submit PR?

Code of Conduct

jscheffl commented 3 weeks ago

I assume this is correct.

The term "object" in the JSON schema validation we run refers to "dict" object types. See: https://json-schema.org/understanding-json-schema/reference/object.

So in your case you need to define it as array. Note that empty arrays in the UI form are always triggered with none value... there was also a ticket open for this to discuss how the form shall handle empty fields...

raphaelauv commented 3 weeks ago

hey , using "array" change my value(array) to a string in another array

['[{"toto": 5}]']

from airflow.models import Param
from airflow.operators.python import PythonOperator
from airflow.utils.dates import days_ago
from airflow import DAG

with DAG(
    dag_id="a",
    schedule_interval=None,
    start_date=days_ago(1),
    params={
        "x": Param([], type="array"),
    }
):
    def a(toto):
        print(toto)
        print(type(toto))

    PythonOperator(task_id="toto", python_callable=a, op_kwargs={"toto": "{{params.x}}"})

so I must use string until a better support

with DAG(
    dag_id="a",
    schedule_interval=None,
    start_date=days_ago(1),
    params={
        "x": Param(None, type="string"),
    }
):
    def a(toto):
        toto = json.loads(toto)
        print(toto)

    PythonOperator(task_id="toto", python_callable=a, op_kwargs={"toto": "{{params.x}}"})
jscheffl commented 3 weeks ago

Yea... and no. Please check the details in https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/params.html#use-params-to-provide-a-trigger-ui-form -might not be obvious but exactly for this case a feature was added:

If you add the attribute items with a dictionary that contains a field type with a value other than “string”, a JSON entry field will be generated for more array types and additional type validation as described in JSON Schema Array Items.

...means in case you want to have non-string items be added define an item type and then a JSON style input box will be added just not validating to an object.

One example is contained in the example DAGs: https://github.com/apache/airflow/blob/main/airflow/example_dags/example_params_ui_tutorial.py#L136 - just using numbers and not dict's.

raphaelauv commented 3 weeks ago

thanks for your help , but I don't want to trigger the dag with an array of string or integer, but an array of dict/map

[{"toto": 5,"tata":"hello"}]

I tried

"x": Param(None, type="array",items={"type": "object"}),

but the front fail with

An invalid form control with name='element_x' is not focusable.
<textarea class="form-control" name="element_x" id="element_x" valuetype="advancedarray" rows="6" required="" style="display: none;"></textarea>
jscheffl commented 2 weeks ago

Okay, @raphaelauv maybe you were hitting a side effect... I did a check on my side adding to airflow/example_dags/example_params_ui_tutorial.py with:

        # An array of numbers
        "array_of_dicts": Param(
            [{}, {"key": "value"}, {"one": "two"}],
            "Only dicts are accepted in this array",
            type="array",
            title="Array of dicts",
            items={"type": "object"},
        ),

..alongside to other parameters.

This renders correctly on my side with: image

Can you check adding a "valid" default value? Because you define Noneper default - which is not valid according to specs. So maybe this special case with a invalid/empty default was not tested. If you want to permit None as valid value then you need to set type=["none", "array"].

raphaelauv commented 2 weeks ago

so with type=["none", "array"] dag parsing is failing

Broken DAG: [/opt/airflow/dags/dags/aaaa.py] Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.12/site-packages/jsonschema/validators.py", line 1328, in validate
    cls.check_schema(schema)
  File "/home/airflow/.local/lib/python3.12/site-packages/jsonschema/validators.py", line 317, in check_schema
    raise exceptions.SchemaError.create_from(error)
jsonschema.exceptions.SchemaError: ['none', 'array'] is not valid under any of the given schemas

but using a not none default value is working

params={
            "x": Param([{}], type="array", items={"type": "object"}),
        }

it let me trigger the dag with the argument [{"toto": 5,"tata":"hello"}]

jscheffl commented 2 weeks ago

so with type=["none", "array"] dag parsing is failing

aaah, sorry RTFM.... https://airflow.apache.org/docs/apache-airflow/stable/core-concepts/params.html --> type=["null", "array"] is the right typing in JSON schema...

raphaelauv commented 2 weeks ago

yeah it work

params={
    "x": Param(None, type=["null", "array"], items={"type": "object"}),
}

so conclusion it's not obvious, if we decide to not let accept array in the type object than I will add a little example in the doc

jscheffl commented 2 weeks ago

Yeah - a small hint/clarificatio nin the docs would be good - there might be others running into the same pitfall.

Also if the default is None as you initially had... might be a small bug. For DAGs that are not scheduled "non valid defaults" are possible for all other fields. So is a small bug in the form generation.