etsy / boundary-layer

Builds Airflow DAGs from configuration files. Powers all DAGs on the Etsy Data Platform
Apache License 2.0
262 stars 58 forks source link

Allow plugins to be used within generator sub-workflows #58

Closed jmchen28 closed 4 years ago

jmchen28 commented 4 years ago

Allow plugins to be evaluated within generator sub-workflows so any operators that may be added by the plugin are added to the correct subgraph.

usage:

name: my-dag

resources:
- name: dataproc-cluster
  type: dataproc_cluster
  properties:
    cluster_name: my-cluster-{{ execution_date.strftime('%s') }}
    num_workers: 10
    region: us-central1

default_task_args:
  start_date: '2020-05-17'

generators:
- name: my-generator
  type: list_generator
  target: my-sub-workflow
  properties:
    items:
      - item_a
      - item_b
---
name: my-sub-workflow

operators:
- name: my-job
  plugin_config:
    default:
      my_plugin: test
  type: dataproc_hadoop
  requires_resources:
  - dataproc-cluster
  properties:
    main_class: com.etsy.my.job.ClassName
    dataproc_hadoop_properties:
      mapreduce.map.output.compress: 'true'
    arguments: [ '--date', '{{ ds }}', '<<item_name>>' ]
mchalek commented 4 years ago

oh interesting, thanks for this addition @jmchen28 ! I will have to take some time later this week to give it a closer look, but on the surface I think I follow the reasoning and I think it makes sense.

jmchen28 commented 4 years ago

thanks! for more context, we have a custom plugin that adds 3 operators downstream of a given task id. right now, when we add the plugin to the top of the DAG definition and the plugin's upstream task uses generators, boundary-layer throws an error at the build step because the upstream node is not part of the primary graph and has not yet been added. adding the plugin to the sub-graph results in a parse error, because plugin_config is not recognized. this change allows plugin use in sub-graphs.

coveralls commented 4 years ago

Pull Request Test Coverage Report for Build 199


Changes Missing Coverage Covered Lines Changed/Added Lines %
boundary_layer/workflow.py 12 14 85.71%
boundary_layer/schemas/dag.py 5 11 45.45%
<!-- Total: 17 25 68.0% -->
Totals Coverage Status
Change from base Build 189: 0.06%
Covered Lines: 2103
Relevant Lines: 2404

💛 - Coveralls