binpash / pash

PaSh: Light-touch Data-Parallel Shell Processing
MIT License
552 stars 39 forks source link

Refactor preprocessor architecture to enhance extensibility #693

Closed Forthoney closed 1 year ago

Forthoney commented 1 year ago

Addresses parts of #672 Preparing to add a airflow as a preprocessor target, I noticed that the current architecture is very rigid and difficult to expand upon. This refactoring/reorganization hopefully addresses some of these. Notable changes (and absence of changes) are

No AST traversal or transformation logic has been rewritten

Files outside of shell_ast have not been changed*

preprocess/preprocess.py has been changed, but simply to import new files (some new modules were made, as I discuss below).

Differing behavior depending on target (vanilla, spec, airflow) is no longer handled by an if statement in ast_to_ast.

Previously, replace_df_regions was a function that ran a if statement on a property of TransformationState to check which specific TransformationState implementation we were dealing with (this would tell us the output target). Now, replace_df_regions is an instance method of AbstractTransformationState. This means that any variation in AST transformation between different output targets can be addressed in a single class that implements of AbstractTransformationState.

The location of the AST Transformation logic has been moved from replace_df_regions() to a method on TransformationState.

For example, this pull request created three implementations of AbstractTransformationState: TransformationState for the vanilla pash behavior, SpeculativeTransformationState for pash spec, and AirflowTransformationState for airflow (this one is empty). The ast_to_ast package contains no checks in its logic for checking how to handle AST transformations - they only need to call replace_df_regions of whatever trans_options argument they have at hand.

The TransformationState code has been moved to a separate file transformation_options.py. I do suggest renaming "State" with some more appropriate name, but this is a minor nitpick

preprocessnode functions have been

github-actions[bot] commented 1 year ago

OS:ubuntu-20.04 Thu Oct 26 20:29:25 UTC 2023 intro: 2/2 tests passed. interface: 41/41 tests passed. compiler: 54/54 tests passed.