Netflix / metaflow

Open Source Platform for developing, scaling and deploying serious ML, AI, and data science systems
https://metaflow.org
Apache License 2.0
8.28k stars 774 forks source link

Composable Flows and Steps #245

Open talebzeghmi opened 4 years ago

talebzeghmi commented 4 years ago

Large ML projects spanning teams reuse pipelines and models (ex: ensembles, feature engineering, etc).

There are two aspects of reuse:

  1. Reuse a whole Flow, to be able to compose a Flow of other Flows.
  2. Reuse a step (imagine it to be a feature engineering step). Steps currently do not have in parameters and return values making reuse more difficult.

A Use Case:

related: https://github.com/Netflix/metaflow/issues/144

crk-codaio commented 4 years ago

We have been thinking about (1) [as graph composition] and hopefully will publish more details on the thoughts we have about it. cc @tuulos For (2) - you could still get the sharing esp. for feature engineering transform as a library of functions (instead of steps); that can just be imported within your step. Some of our team internal to Netflix employ this route for sharing such business logic.

Also, for relatively common collection of transformations you could still use (1) if you want to even reduce the step boilerplate from being repeated.

dpatschke commented 4 years ago

@talebzeghmi Thank you for opening this issue! Your issue has articulated some of the exact metaflow architectural questions that our team has been having around productionizing/pipelining metaflow ... especially around the reusability of feature engineering code within multiple flows.

I don't want to have to copy and paste scikit-learn Transformer code to each new modeling flow especially when there is a lot of boilerplate/utility code that I've written around: 1) leveraging pandas to protect against differing columns being passed in. 2) pulling in a tagged 'production' model from a Run that is then reloaded for just the data 'transform' and not the 'fit' as well.

@seeravikiran Thanks for some of the recommendations regarding structuring and code reusability to address some of items presented in this issue. I will continue to investigate what that would look like on our end. In the meantime, I would like to point you to this post made on the metaflow community page that actually proposes a pretty interesting idea to the issue. I'm curious as to your thoughts on this (or something like this).

tuulos commented 4 years ago

As @seeravikiran pointed out above, we have plans for graph composition. Meanwhile, this form of subclassing is supported https://github.com/Netflix/metaflow/issues/144#issuecomment-592245062

dpatschke commented 4 years ago

@tuulos Thanks for the response and the reference. This is extremely helpful and greatly appreciated!

talebzeghmi commented 4 years ago

@tuulos, would you be able to share an RFC kind of document on how Metaflow would support composition? In this way we can give feedback from our Applied Scientists on it's usability, before the code is written.

thank you!

tuulos commented 4 years ago

@talebzeghmi yep, I have been writing a doc that I should be able to share this month. I will ping you when it is available. Thanks for your patience :)

PertuyF commented 1 year ago

Hello @tuulos , any news since this doc you've been writing in 2020 regarding metaflow composable flows?

DonIvanCorleone commented 9 months ago

Hi @tuulos,

is there any progress with respect to this topic? Would be extremely helpful for our use business case we are having right now :) Any feedback appreciated.

Cheers