LineaLabs / lineapy

Move fast from data science prototype to pipeline. Capture, analyze, and transform messy notebooks into data pipelines with just two lines of code.
https://lineapy.org
Apache License 2.0
662 stars 58 forks source link

LIN-685, LIN-640 Pipeline Writer refactor #827

Closed andycui97 closed 1 year ago

andycui97 commented 1 year ago

Description

Refactor pieces around PipelineWriters. Changes:

Fixes LIN-685, LIN-640

Type of change

How Has This Been Tested?

Existing pytests.

yoonspark commented 1 year ago

Looks good overall. A couple of thoughts for further refactoring we can do to enhance contributor experience:

  1. Currently, the "compositional" logic for DAG file is scattered between the framework-specific pipeline writer (e.g., AirflowPipelineWriter) and the Jinja template (e.g., airflow_dag_PythonOperator.jinja). We may consider moving as much compositional logic as possible into the Jinja template so that the contributor can focus on adapting/updating it. For instance, compositional logic in AirflowPipelineWriter.get_task_params_args() and tasks can be factored into Jinja template which may actually make it more clear as to what is being written. This migration of compositional logic to Jinja template may also make lighter and consistent the code for each framework-specific pipeline writer.

  2. Current refactoring uses TaskDefinition as a contributor-facing abstraction, which hides unnecessary implementation details. I think we can expand this abstraction to even hide a lot of internal data structures still exposed in BasePipelineWriter. For instance, BasePipelineWriter's methods still use SessionArtifacts to extract information necessary for code generation. The proposal here is to instead use an expanded/augmented version of TaskDefinition so that such information extraction from internal data structures is not necessary; and BasePipelineWriter's methods simply use this contributor-friendly abstraction to carry on their compositional jobs.

Note that these are suggestions for future work, and they do not block merging of this PR.