LineaLabs / lineapy

Move fast from data science prototype to pipeline. Capture, analyze, and transform messy notebooks into data pipelines with just two lines of code.
https://lineapy.org
Apache License 2.0
660 stars 56 forks source link

Support the use case where the user wants to include `assert`s in the slice. #449

Open marov opened 2 years ago

marov commented 2 years ago

What: We need to include support assert statements in slices. This is a "cheap" way to add sanity checks to the "productionalized" pipelines

Why: Just seen this cell in one of my PR reviews:

# check whether all last trading days is in date column of equity dataframe
set(df_monthly_last_trading_day["trading_day"]).issubset(set(df_equity["date"]))

The output is "True" on the test sample, but this does not help, as a) it will be omitted in the extracted DAG and b) without assert it does not stop execution or raise alerts on the upcoming data ...

Possible Approaches: Ideal scenario: Linea catches cells or print statements that produce boolean output and converts them to try/catch blocks (better code practice than asserts) Middle scenario: Linea adds "assert" in such cases MVP: Users write assert themself, but linea includes them in output

yifanwu commented 2 years ago

Hi @marov, just to be sure, you want a way to include all the assert ... statements in the slice right?

Thanks!

marov commented 2 years ago

The ones connected to the artifact we slice

yifanwu commented 2 years ago

Thanks Mike for the input! For next steps, we should consider putting this on the roadmap for Jan. Needs to solicit feedback from a few other DEs to help with prioritization.

yifanwu commented 2 years ago

The ones connected to the artifact we slice

What if there is an assertion on an upstream dependency? E.g.,

import lineapy
a = foo()
assert a > 10
b = bar(a)
lineapy.save(b, 'b val')

I assume you want assert a > 10 to also be included right?

marov commented 2 years ago

Yes if it fails - we shouldn't continue.