LineaLabs / lineapy

Move fast from data science prototype to pipeline. Capture, analyze, and transform messy notebooks into data pipelines with just two lines of code.
https://lineapy.org
Apache License 2.0
663 stars 58 forks source link

Lin 547 implement parameterization for framework airflow pipeline with params setup #773

Closed mingjerli closed 2 years ago

mingjerli commented 2 years ago

Description

On top of LIN-566 refactor airflow jinja template, this PR implements parameterized pipeline for airflow (for both PythonOperatorPerArtifact and PythonOperatorPerSession flavor).

During implementation, it also discovered and fixed the issue that all user input parameter types are cast to string in the Python module via argparse, since we found a similar issue with Airflow params.

Fixes # (issue)

LIN-547 LIN-566

Type of change

Please delete options that are not relevant.

How Has This Been Tested?

yoonspark commented 2 years ago

Many of my comments are stylistic feedback, so I agree with Shardul that we can fast-track merging this to unblock other tickets necessary for the upcoming launch.

One overarching Q: Are there going to be only two flavors for the PythonOperator (i.e., per artifact and per session)? Is the new Airflow template general enough to accommodate any new flavor (if applicable)?

mingjerli commented 2 years ago

well .... I personally think there should be DockerOperator and KubernetesPodOperator; but I don't see we should worry the integration problem right now.