binpash / pash

PaSh: Light-touch Data-Parallel Shell Processing
MIT License
552 stars 39 forks source link

[Renewed] Refactor preprocessor architecture to enhance extensibility #694

Closed Forthoney closed 1 year ago

Forthoney commented 1 year ago

Addresses parts of #672 Preparing to add a airflow as a preprocessor target, I noticed that the current architecture is very rigid and difficult to expand upon. This refactoring/reorganization hopefully addresses some of these. Notable changes (and absence of changes) are

No AST traversal or transformation logic has been rewritten

Files outside of shell_ast have not been changed*

preprocess/preprocess.py has been changed, but simply to import new files (some new modules were made, as I discuss below).

Differing behavior depending on target (vanilla, spec, airflow) is no longer handled by an if statement in ast_to_ast.

Previously, replace_df_regions was a function that ran a if statement on a property of TransformationState to check which specific TransformationState implementation we were dealing with (this would tell us the output target). Now, replace_df_regions is an instance method of AbstractTransformationState. This means that any variation in AST transformation between different output targets can be addressed in a single class that implements of AbstractTransformationState.

The location of the AST Transformation logic has been moved from replace_df_regions() to a method on TransformationState.

For example, this pull request created three implementations of AbstractTransformationState: TransformationState for the vanilla pash behavior, SpeculativeTransformationState for pash spec, and AirflowTransformationState for airflow (this one is empty). The ast_to_ast package contains no checks in its logic for checking how to handle AST transformations - they only need to call replace_df_regions of whatever trans_options argument they have at hand.

The TransformationState code has been moved to a separate file transformation_options.py. I do suggest renaming "State" with some more appropriate name, but this is a minor nitpick

preprocessnode functions have been

github-actions[bot] commented 1 year ago

OS:ubuntu-20.04 Sat Oct 28 17:17:55 UTC 2023 intro: 2/2 tests passed. interface: 41/41 tests passed. compiler: 54/54 tests passed.

github-actions[bot] commented 1 year ago

OS:ubuntu-20.04 Sat Oct 28 17:19:14 UTC 2023 intro: 2/2 tests passed. interface: 41/41 tests passed. compiler: 54/54 tests passed.

github-actions[bot] commented 1 year ago
OS = Debian 10 CPU = Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz Ram = 15752 Hash = 4ab14e5c Kernel= Linux 4.15.0-197-generic x86_64 benchmark tests passed failed untested unresolved unsupported not_in_use other_status
posix 494 375 41 31 6 40 1 0
intro 2 2 0 0 0 0 0 0
interface 41 41 0 0 0 0 0 0
compiler 54 54 0 0 0 0 0 0
github-actions[bot] commented 1 year ago
OS = Debian 10 CPU = Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz Ram = 15752 Hash = 4436c962 Kernel= Linux 4.15.0-197-generic x86_64 benchmark tests passed failed untested unresolved unsupported not_in_use other_status
posix 494 375 41 31 6 40 1 0
intro 2 2 0 0 0 0 0 0
interface 41 41 0 0 0 0 0 0
compiler 54 54 0 0 0 0 0 0
angelhof commented 1 year ago

This looks great to me!

Forthoney commented 11 months ago

Closes #672