getindata / kedro-airflow-k8s

Kedro Plugin to support running pipelines on Kubernetes using Airflow.
https://kedro-airflow-k8s.readthedocs.io
Apache License 2.0
29 stars 11 forks source link

ValueError: Failed to format pattern '${xxx}': no config value found, no default provided #129

Open stephanecollot opened 2 years ago

stephanecollot commented 2 years ago

Hello

With: kedro 0.17.4 kedro-airflow-k8s 0.7.3 python 3.8.12

I have a templated catalog:

training_data:
  type: spark.SparkDataSet
  filepath: data/${folders.intermediate}/training_data
  file_format: parquet
  save_args:
    mode: 'overwrite'
  layer: intermediate

with the parameter set in my globals.yml

folders:
    intermediate: 02_intermediate

And when I run: kedro airflow-k8s compile

I get the following error

Traceback (most recent call last):
  File "/Users/user/miniconda3/envs/kedro/bin/kedro", line 8, in <module>
    sys.exit(main())
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro/framework/cli/cli.py", line 265, in main
    cli_collection()
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro/framework/cli/cli.py", line 210, in main
    super().main(
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/click/decorators.py", line 21, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro_airflow_k8s/cli.py", line 64, in compile
    ) = get_dag_filename_and_template_stream(
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro_airflow_k8s/template.py", line 170, in get_dag_filename_and_template_stream
    template_stream = _create_template_stream(
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro_airflow_k8s/template.py", line 92, in _create_template_stream
    pipeline_grouped=context_helper.pipeline_grouped,
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro_airflow_k8s/context_helper.py", line 46, in pipeline_grouped
    return TaskGroupFactory().create(self.pipeline, self.context.catalog)
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro/framework/context/context.py", line 329, in catalog
    return self._get_catalog()
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro/framework/context/context.py", line 365, in _get_catalog
    conf_catalog = self.config_loader.get("catalog*", "catalog*/**", "**/catalog*")
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro/config/templated_config.py", line 191, in get
    return _format_object(config_raw, self._arg_dict)
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro/config/templated_config.py", line 264, in _format_object
    new_dict[key] = _format_object(value, format_dict)
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro/config/templated_config.py", line 264, in _format_object
    new_dict[key] = _format_object(value, format_dict)
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro/config/templated_config.py", line 279, in _format_object
    return IDENTIFIER_PATTERN.sub(lambda m: str(_format_string(m)), val)
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro/config/templated_config.py", line 279, in <lambda>
    return IDENTIFIER_PATTERN.sub(lambda m: str(_format_string(m)), val)
  File "/Users/user/miniconda3/envs/kedro/lib/python3.8/site-packages/kedro/config/templated_config.py", line 242, in _format_string
    raise ValueError(
ValueError: Failed to format pattern '${folders.intermediate}': no config value found, no default provided

With this conf/base/airflow-k8s.yaml

host: https://airflow.url

output: dags

run_config:

  image: spark_image

  image_pull_policy: Always

  startup_timeout: 600

  namespace: namespace

  experiment_name: experiment

  run_name: experiment

  cron_expression: "@daily"

  description: "experiment Pipeline"

  service_account_name: namespace-vault

  volume:
      disabled: True

  macro_params: [ds, prev_ds]

  variables_params: []

I add the fact that kedro run works.

Do you have any hint?

stephanecollot commented 2 years ago

Sorry actually kedro run doesn't work. So it is not coming from kedro-airflow-k8s

stephanecollot commented 2 years ago

Actually when I uninstall kedro-airflow-k8s then kedro run works again

stephanecollot commented 2 years ago

It seems that now with kedro-airflow-k8s-0.6.7 and with this conf/base/airflow-k8s.yaml


# Base url of the Apache Airflow, should include the schema (http/https)
host: https://airflow.url

# Directory from where Apache Airflow is reading DAGs definitions
output: dags

# Configuration used to run the pipeline
run_config:

    # Name of the image to run as the pipeline steps
    image: experiment

    # Pull policy to be used for the steps. Use Always if you push the images
    # on the same tag, or Never if you use only local images
    image_pull_policy: IfNotPresent

    # Pod startup timeout in seconds
    startup_timeout: 600

    # Namespace for Airflow pods to be created
    namespace: airflow

    # Name of the Airflow experiment to be created
    experiment_name: experiment

    # Name of the dag as it's presented in Airflow
    run_name: experiment

    # Apache Airflow cron expression for scheduled runs
    cron_expression: "@daily"

    # Optional start date in format YYYYMMDD
    #start_date: "20210721"

    # Optional pipeline description
    #description: "Very Important Pipeline"

    # Comma separated list of image pull secret names
    #image_pull_secrets: my-registry-credentials

    # Service account name to execute nodes with
    #service_account_name: default

    # Optional volume specification
    volume:
        # Storage class - use null (or no value) to use the default storage
        # class deployed on the Kubernetes cluster
        storageclass: # default
        # The size of the volume that is created. Applicable for some storage
        # classes
        size: 1Gi
        # Access mode of the volume used to exchange data. ReadWriteMany is
        # preferred, but it is not supported on some environements (like GKE)
        # Default value: ReadWriteOnce
        #access_modes: [ReadWriteMany]
        # Flag indicating if the data-volume-init step (copying raw data to the
        # fresh volume) should be skipped
        skip_init: False
        # Allows to specify fsGroup executing pipelines within containers
        # Default: root user group (to avoid issues with volumes in GKE)
        owner: 0
        # Tells if volume should not be used at all, false by default
        disabled: False

    # List of optional secrets specification
    secrets:
            # deploy_type: The type of secret deploy in Kubernetes, either `env` or
            # `volume`
        -   deploy_type: "env"
            # deploy_target: (Optional) The environment variable when `deploy_type` `env`
            # or file path when `deploy_type` `volume` where expose secret. If `key` is
            # not provided deploy target should be None.
            deploy_target: "SQL_CONN"
            # secret: Name of the secrets object in Kubernetes
            secret: "airflow-secrets"
            # key: (Optional) Key of the secret within the Kubernetes Secret if not
            # provided in `deploy_type` `env` it will mount all secrets in object
            key: "sql_alchemy_conn"

    # Apache Airflow macros to be exposed for the parameters
    # List of macros can be found here:
    # https://airflow.apache.org/docs/apache-airflow/stable/macros-ref.html
    macro_params: [ds, prev_ds]

    # Apache Airflow variables to be exposed for the parameters
    variables_params: [env]

    # Optional resources specification
    #resources:
        # Default configuration used by all nodes that do not declare the
        # resource configuration. It's optional. If node does not declare the resource
        # configuration, __default__ is assigned by default, otherwise cluster defaults
        # will be used.
        #__default__:
            # Optional labels to be put into pod node selector
            #node_selectors:
              #Labels are user provided key value pairs
              #node_pool_label/k8s.io: example_value
            # Optional labels to apply on pods
            #labels:
              #running: airflow
            # Optional annotations to apply on pods
            #annotations:
              #iam.amazonaws.com/role: airflow
            # Optional list of kubernetes tolerations
            #tolerations:
                #- key: "group"
                  #value: "data-processing"
                  #effect: "NoExecute"
                #- key: "group"
                  #operator: "Equal",
                  #value: "data-processing",
                  #effect: "NoSchedule"
            #requests:
                #Optional amount of cpu resources requested from k8s
                #cpu: "1"
                #Optional amount of memory resource requested from k8s
                #memory: "1Gi"
            #limits:
                #Optional amount of cpu resources limit on k8s
                #cpu: "1"
                #Optional amount of memory resource limit on k8s
                #memory: "1Gi"
        # Other arbitrary configurations to use
        #custom_resource_config_name:
            # Optional labels to be put into pod node selector
            #labels:
                #Labels are user provided key value pairs
                #label_key: label_value
            #requests:
                #Optional amount of cpu resources requested from k8s
                #cpu: "1"
                #Optional amount of memory resource requested from k8s
                #memory: "1Gi"
            #limits:
                #Optional amount of cpu resources limit on k8s
                #cpu: "1"
                #Optional amount of memory resource limit on k8s
                #memory: "1Gi"

    # Optional external dependencies configuration
    #external_dependencies:
        # Can just select dag as a whole
        #- dag_id: upstream-dag
        # or detailed
        #- dag_id: another-upstream-dag
        # with specific task to wait on
        #  task_id: with-precise-task
        # Maximum time (minute) to wait for the external dag to finish before this
        # pipeline fails, the default is 1440 == 1 day
        #  timeout: 2
        # Checks if the external dag exists before waiting for it to finish. If it
        # does not exists, fail this pipeline. By default is set to true.
        #  check_existence: False
        # Time difference with the previous execution to look at (minutes),
        # the default is 0 meaning no difference
        #  execution_delta: 10
    # Optional authentication to MLflow API
    #authentication:
      # Strategy that generates the credentials, supported values are:
      # - Null
      # - GoogleOAuth2 (generating OAuth2 tokens for service account provided by
      # GOOGLE_APPLICATION_CREDENTIALS)
      # - Vars (credentials fetched from airflow Variable.get - specify variable keys,
      # matching MLflow authentication env variable names, in `params`,
      # e.g. ["MLFLOW_TRACKING_USERNAME", "MLFLOW_TRACKING_PASSWORD"])
      #type: GoogleOAuth2
      #params: []

I can run kedro airflow-k8s compile and it works But kedro run, still give the same error.

em-pe commented 2 years ago

@stephanecollot Thanks for reporting an issue. If I'm not wrong it's related to getindata/kedro-kubeflow#72 - @szczeles can you confirm?

szczeles commented 2 years ago

@em-pe Yep, it seems so. If we apply the same trick here, the issue should be gone.

@stephanecollot As a temporary workaround, you can try adding these lines into your project's settings.py:

import sys
if 'airflow-k8s' not in sys.argv:
    DISABLE_HOOKS_FOR_PLUGINS = ("kedro-airflow-k8s",)

If your code works with this hack, it's definitely same issue as https://github.com/getindata/kedro-kubeflow/issues/72

stephanecollot commented 2 years ago

Thanks for your reply. Yes I tried DISABLE_HOOKS_FOR_PLUGINS and kedro run works again.