Netflix / metaflow

Open Source Platform for developing, scaling and deploying serious ML, AI, and data science systems
https://metaflow.org
Apache License 2.0
8.32k stars 774 forks source link

Load Flow/Step decorators values dynamically from config #431

Open abdulsalama opened 3 years ago

abdulsalama commented 3 years ago

One of the examples in the tutorials shows how you can override value of batch decorator via CLI:

$ python BigSum.py run --with batch:cpu=4,memory=10000,queue=default,image=ubuntu:latest

However, this is not supported for other decorators such as @schedule, @resources, etc...

The ask here is to support it for all decorators such that values of these decorators can be loaded dynamically from config when running the flow.

savingoyal commented 3 years ago

@abdulsalama You can override all the step-level decorators today via CLI: python myflow.py run --with [batch/resources/timeout/retry/catch/conda]. Do you have a use case that won't be covered by this?

abdulsalama commented 3 years ago

ah this nice. didn't know it is supported for all step-level docorators. How about flow-level decorator like @schedule? does it work the same way?

abdulsalama commented 3 years ago

Also with the step-level decorators, is there a way to specify different values for the different steps in the CLI? The other aspect of this is feeding these values from a config file. For example, if I have a config file like follow:

resources:
   step_foo:
     cpu: 1
     memory: 500
   step_bar:
     cpu: 2
     memory: 1000

Flow_params:
 learning_rate: 0.5
 loss: mae

Then running this command: python flow.py --config-file config.yaml should be equivalent to running: python flow.py --learning_rate=0.5 --loss=mae with resources:step_foo={cpu=1,memory=500}, step_bar={cpu=2, memory=1000}

See this related ticket

It would be nice to support this natively in metaflow

savingoyal commented 3 years ago

@abdulsalama Following on this ticket after our conversation.

  1. You can override the parameters via environment variables as well - METAFLOW_ALPHA=0.1 python3 myflow.py run and python3 myflow.py run --alpha=0.1 are equivalent. With a shim that exports the contents on your yaml, you should be able to set the params - $(export-yaml config.yaml) python3 myflow.py run

  2. Metaflow also supports functions as defaults for parameters which can help you manage defaults in a more programmatic manner.

  3. You can also set environment variables for configuring decorator args

    
    from metaflow import batch, FlowSpec, step
    import os

class MyFlow(FlowSpec):

@batch(cpu=os.environ.get('NUM_CPUS'))
@step
def start(self):
    print('Hello from start')
    self.next(self.end)

@step
def end(self):
    print('Goodbye')

if name == 'main': MyFlow()

NUM_CPUS=2 python3 myflow.py run



A combination of 1, 2, and 3 should allow you to easily override decorator args as well as parameters and plumb into an existing configuration manager.  
abdulsalama commented 3 years ago

I see. this is very helpful. Thanks a lot @savingoyal

abdulsalama commented 3 years ago

@savingoyal just to follow up on this, to make it even more straightforward for the user, we are thinking of building a layer on top of that such that once the user have configs in yaml, they can run the flow as follow:

Python run my-flow.py --config-file config.yaml

The config file will look something like this:

# Optional: Specifies parent yaml file to inherit from.
# The config then overrides values in parent in a sparse manner
base-yaml:

# Optional: Override metaflow decorators.
# Note: Currently metaflow kfp fork does NOT support this
decorators:
  # Optional: Default would be to run once.
  # Note: Currently metaflow kfp fork does NOT support this
  schedule:
  # - 0 0 * * *
  # - once
  # - now

  # Optional: If not specified, we will use a default image
  # This is currently supported by metaflow kfp but maybe
  # need to be implemented as a decorator as well.
  base_image: docker1

  # Optional: Only one of base_image_uri or requirements
  # can be specified at max.
  # There is another big discussion around dependency management
  # and is out of the scope of this document so won't focus
  # too much on it here. But ideally, the vision is that the user
  # can just specify the requirements and we take care of
  # the rest in the background.
  # This will have to implemented as a decorator
  requirements:
  #  - pandas=0.22.0

# Optional: These are user defined parameters that maps
# to metaflow params.
# They should be understood and consumed by the flow.
params:
 learning_rate: 0.5
 loss: mae
 export_metrics_keys:
   - r_squared
   - precision
   - recall

We will experiment with building this and see how it goes, and potentially we can discuss contributing this back to metaflow

savingoyal commented 3 years ago

That sounds great! Please keep me posted.

kumarprabhu1988 commented 2 years ago

@savingoyal Is there any update on this? Or is there another way of doing this now?