LineaLabs / lineapy

Move fast from data science prototype to pipeline. Capture, analyze, and transform messy notebooks into data pipelines with just two lines of code.
https://lineapy.org
Apache License 2.0
662 stars 58 forks source link

LIN-624 Add first DVC pipeline flavor, run all modules in a single stage #801

Closed andycui97 closed 1 year ago

andycui97 commented 1 year ago

Description

First commit for full dvc pipeline support. Add first DVC pipeline flavor implementation, which simply runs all the modules in a single stage.

Upcoming changes: flavors for a stage per session and stage per module.

Type of change

Please delete options that are not relevant.

How Has This Been Tested?

Added UTs, also tested generated docker file runs correctly.

yoonspark commented 1 year ago

Looks good to me. I am assuming that the test case has been actually run on DVC to ensure it runs (i.e., templating works).

andycui97 commented 1 year ago

Looks good to me. I am assuming that the test case has been actually run on DVC to ensure it runs (i.e., templating works).

Yeah I ran the docker to make sure the generated files actually run. @yoonspark can you try it as well? https://docs.lineapy.org/en/latest/guide/build_pipelines/pipeline_basics.html#testing-locally is the exact setup I tried to emulate.

lionsardesai commented 1 year ago

@yoonspark had an interesting question. can we rename the dvc.yaml to something else or does dvc not follow that?

andycui97 commented 1 year ago

@yoonspark had an interesting question. can we rename the dvc.yaml to something else or does dvc not follow that?

Great question, we can't find evidence right now that the dvc.yaml file is parametrizable and its name can be changed. The closest functionality they have is being able to have more than one dvc.yaml file in different directories: https://dvc.org/doc/user-guide/project-structure/dvcyaml-files

So for now we're stuck with hard coding this file name and can't have something like dvc_pipeline.yaml