LineaLabs / lineapy

Move fast from data science prototype to pipeline. Capture, analyze, and transform messy notebooks into data pipelines with just two lines of code.
https://lineapy.org
Apache License 2.0
662 stars 58 forks source link

LIN-619 Fix imports in requirements.txt #807

Closed andycui97 closed 1 year ago

andycui97 commented 1 year ago

Description

Currently, the requirements.txt is generated from the entire session graph(s if multiple sessions). This is more than the pipeline need.

This is fixed by giving session artifacts a get_libraries method. This will go through and match session libraries with import_nodes that form the ImportNodeCollection to ensure that libraries not in the slice are not included.

Fixes LIN-619

Type of change

How Has This Been Tested?

tests/unit/plugins/expected/<framework>_pipeline_housing * .ipynb test cases contain examples where the Specifically in these examples lineapy, seaborn and altair were libraries only used during EDA and not actually needed for the artifact.

andycui97 commented 1 year ago

Yeah, the stuff you keep in the session_artifacts.py make sense to me.

One out-of-scope issue I found is related to the library versions across multiple sessions. No need to act here, I just open a ticket for that with a description. https://linea.atlassian.net/browse/LIN-639

Yeah I decided to ignore that for now ... 👍

andycui97 commented 1 year ago

looks good. assuming the position args check that you have is not required for all import node checks.

Yes, the position arg checks are looking for specific import nodes. I've documented exactly what type of nodes we are looking for both from a high level perspective and how that translates to our code's node checks.

If those assumptions are wrong and create bugs in the future at least it should be very clear what our intentions were and how to fix the mistake.