astronomer / airflow-provider-great-expectations

Great Expectations Airflow operator
http://greatexpectations.io
Apache License 2.0
157 stars 53 forks source link

Feature Request: pass parameters from Airflow to GE Checkpoint #108

Open kujaska opened 1 year ago

kujaska commented 1 year ago

We need to run a GE checkpoint from Airflow. Checkpoint is based on SQL query. SQL query must get values for its parameters from Airflow - e.g. a datamart should be checked for DQ for particular date and region after that date and region were refreshed from another Airflow task.

Part of checkpoint.yml looks like:

validations:
  - batch_request:
      datasource_name: snowflake
      data_connector_name: default_runtime_data_connector_name
      data_asset_name: db1.table1
      runtime_parameters:
        query: "SELECT *
            from db1.table1
            WHERE fld1 > $DATE_PARAM_FROM_AIRFLOW and fld2 = $REGION_PARAM_FROM_AIRFLOW
"

How to do it properly with GreatExpectationsOperator?

Looks like it can't pass parameters only, while query_to_validate or checkpoint_config will break unit tests (you will need airflow to test your checkpoint!)

Workaround: use environment variables.

Thanks!