etsy / boundary-layer

Builds Airflow DAGs from configuration files. Powers all DAGs on the Etsy Data Platform
Apache License 2.0
262 stars 58 forks source link

Support for Airflow HTTP operator #23

Open moryakub opened 5 years ago

moryakub commented 5 years ago

Airflow HTTP operator has an optional parameter named response_check that gets a function (docs)

Adding a corresponding schema named http.yaml for a boundary-layer http operator resulted in the next error: boundary_layer.exceptions.InvalidConfig: Invalid config spec in file c:\path_to_venv\site-packages\boundary_layer_default_plugin\config\operators\http.yaml: {'parameters_jsonschema': ["Invalid JSON schema: 'function' is not valid under any of the given schemas\n\nFailed validating 'anyOf' in schema['properties']['properties']['additionalProperties']['properties']['type']:\n {'anyOf': [{'$ref': '#/definitions/simpleTypes'},\n {'items': {'$ref': '#/definitions/simpleTypes'},\n 'minItems': 1,\n 'type': 'array',\n 'uniqueItems': True}]}\n\nOn instance['properties']['response_check']['type']:\n 'function'"]}

http.yaml content is: (with problematic parameter #'d out)

name: http
operator_class: SimpleHttpOperator
operator_class_module: airflow.operators.http_operator
schema_extends: base
parameters_jsonschema:
  properties:
    http_conn_id:
      type: string
    endpoint:
      type: string
    method:
      type: object
    data:
      type: object
    headers:
      type: object
    # response_check:
    #   type: function
    extra_options:
      type: object
    xcom_push:
      type: boolean
    log_response:
      type: boolean
  required:
    - http_conn_id
    - endpoint
  additionalProperties: false
mchalek commented 5 years ago

hi @moryakub thanks for the question and for working to add the http operator. The only way to define functions in boundary-layer at this time is by using string-valued parameters, and supplying those parameters as verbatim strings in the dag (meaning strings that are enclosed by the delimiters << and >>, which get pasted directly into the generated python code).

Examples of other function-valued parameters that work this way are the on_failure_callback and other callbacks that are supported by the base operator config.

To use this in your YAML workflow, you would have to do something like:

name: my-dag
imports:
  objects:
  - module: my.package.name
    objects:
    - my_response_check_function
operators:
- name: my-http-operator
  type: http
  properties:
    endpoint: http://whatever/
    response_check: <<my_response_check_function>>

or you could even define an inline lambda, if the function is simple enough. Something like:

name: my-dag
operators:
- name: my-http-operator
  type: http
  properties:
    endpoint: http://whatever/
    response_check: <<lambda response: response.code == 200>>

I hope this helps! Feel free to submit your HTTP operator config in a pull request, once you've got it working. We'd appreciate it!