Open abdulsalama opened 3 years ago
@abdulsalama You can override all the step-level decorators today via CLI: python myflow.py run --with [batch/resources/timeout/retry/catch/conda]
. Do you have a use case that won't be covered by this?
ah this nice. didn't know it is supported for all step-level docorators. How about flow-level decorator like @schedule? does it work the same way?
Also with the step-level decorators, is there a way to specify different values for the different steps in the CLI? The other aspect of this is feeding these values from a config file. For example, if I have a config file like follow:
resources:
step_foo:
cpu: 1
memory: 500
step_bar:
cpu: 2
memory: 1000
Flow_params:
learning_rate: 0.5
loss: mae
Then running this command:
python flow.py --config-file config.yaml
should be equivalent to running:
python flow.py --learning_rate=0.5 --loss=mae with resources:step_foo={cpu=1,memory=500}, step_bar={cpu=2, memory=1000}
See this related ticket
It would be nice to support this natively in metaflow
@abdulsalama Following on this ticket after our conversation.
You can override the parameters via environment variables as well - METAFLOW_ALPHA=0.1 python3 myflow.py run
and python3 myflow.py run --alpha=0.1
are equivalent. With a shim that exports the contents on your yaml, you should be able to set the params - $(export-yaml config.yaml) python3 myflow.py run
Metaflow also supports functions as defaults for parameters which can help you manage defaults in a more programmatic manner.
You can also set environment variables for configuring decorator args
from metaflow import batch, FlowSpec, step
import os
class MyFlow(FlowSpec):
@batch(cpu=os.environ.get('NUM_CPUS'))
@step
def start(self):
print('Hello from start')
self.next(self.end)
@step
def end(self):
print('Goodbye')
if name == 'main': MyFlow()
NUM_CPUS=2 python3 myflow.py run
A combination of 1, 2, and 3 should allow you to easily override decorator args as well as parameters and plumb into an existing configuration manager.
I see. this is very helpful. Thanks a lot @savingoyal
@savingoyal just to follow up on this, to make it even more straightforward for the user, we are thinking of building a layer on top of that such that once the user have configs in yaml, they can run the flow as follow:
Python run my-flow.py --config-file config.yaml
The config file will look something like this:
# Optional: Specifies parent yaml file to inherit from.
# The config then overrides values in parent in a sparse manner
base-yaml:
# Optional: Override metaflow decorators.
# Note: Currently metaflow kfp fork does NOT support this
decorators:
# Optional: Default would be to run once.
# Note: Currently metaflow kfp fork does NOT support this
schedule:
# - 0 0 * * *
# - once
# - now
# Optional: If not specified, we will use a default image
# This is currently supported by metaflow kfp but maybe
# need to be implemented as a decorator as well.
base_image: docker1
# Optional: Only one of base_image_uri or requirements
# can be specified at max.
# There is another big discussion around dependency management
# and is out of the scope of this document so won't focus
# too much on it here. But ideally, the vision is that the user
# can just specify the requirements and we take care of
# the rest in the background.
# This will have to implemented as a decorator
requirements:
# - pandas=0.22.0
# Optional: These are user defined parameters that maps
# to metaflow params.
# They should be understood and consumed by the flow.
params:
learning_rate: 0.5
loss: mae
export_metrics_keys:
- r_squared
- precision
- recall
We will experiment with building this and see how it goes, and potentially we can discuss contributing this back to metaflow
That sounds great! Please keep me posted.
@savingoyal Is there any update on this? Or is there another way of doing this now?
One of the examples in the tutorials shows how you can override value of batch decorator via CLI:
$ python BigSum.py run --with batch:cpu=4,memory=10000,queue=default,image=ubuntu:latest
However, this is not supported for other decorators such as @schedule, @resources, etc...
The ask here is to support it for all decorators such that values of these decorators can be loaded dynamically from config when running the flow.