etsy / boundary-layer

Builds Airflow DAGs from configuration files. Powers all DAGs on the Etsy Data Platform
Apache License 2.0
261 stars 58 forks source link

Sort operator arguments on DAG generation #83

Open estherbester opened 3 years ago

estherbester commented 3 years ago

First off, thank you so much for making this tool available! I am very happy using this library at my work.

For historical reasons, I version-control my generated DAGs and have found it easier to parse the diffs when I sort the operator arguments by name. I have tried this out locally by modifying the DagBuilderBase.render_operator method. Would there be any reason not to have the builder sort the arguments for each operator?

dossett commented 3 years ago

@estherbester Glad you enjoy boundary layer, we do too! :-)

Adding an option to sort the arguments that way seems like a fine idea. I wouldn't say I rely on the current approach, but sometimes it's easier to compare how we all an operator with the Airflow docs with the current approach, assuming I understand you suggestion correctly.

I'd be happy to look at any PRs to support that functionality.

jdimatteo commented 1 year ago

FWIW you can monkey patch sorted behavior in with the following (based on incomplete https://github.com/etsy/boundary-layer/pull/92):

import boundary_layer.registry.types.operator

boundary_layer.registry.types.operator.OperatorNode.unpatched_operator_args = boundary_layer.registry.types.operator.OperatorNode.operator_args

def monkey_patched_operator_args(self):
    args = self.unpatched_operator_args
    sorted_operator_args = {k: args[k] for k in sorted(args)}
    return sorted_operator_args

boundary_layer.registry.types.operator.OperatorNode.operator_args = property(monkey_patched_operator_args)

Also, in my testing, setting PYTHONHASHSEED=0 is another option for deterministic DAG generation.