LineaLabs / lineapy

Move fast from data science prototype to pipeline. Capture, analyze, and transform messy notebooks into data pipelines with just two lines of code.
https://lineapy.org
Apache License 2.0
664 stars 58 forks source link

LIN-632, LIN-618 Ray Integration #862

Closed andycui97 closed 1 year ago

andycui97 commented 1 year ago

Description

Adds support for ray dags. Setting framework= RAY in to_pipeline now produces a dag file that can be run using ray.

Ray DAGS currently use .remote() instead of the new alpha workflow API with .bind().

No setup and teardown tasks are added because Ray also does not have a mechanism to specify execution order to allow them to be run in the correct places.

All sink nodes in the graph are materialized since RAY does not have a "run whole dag" API

Fixes # (issue)

LIN-632, LIN-618

Type of change

Please delete options that are not relevant.

How Has This Been Tested?

Snapshot and e2e tests added following airflow example including a new github action to run Ray tests.

andycui97 commented 1 year ago

looks good. please run the housing example with inter task communication for sanity checks and then merge.

Done, synced up and made sure housing example with inter task communication works.