botify-labs / simpleflow

Python library for dataflow programming.
https://botify-labs.github.com/simpleflow/
MIT License
68 stars 24 forks source link

Add meta option to download a binary before running an activity task #318

Closed jbbarth closed 6 years ago

jbbarth commented 6 years ago

The purpose of this issue is to add the ability to override binaries version via configuration passed in workflows input, propagated to each activity task that uses the binary.

1/ Passing the binaries version

This is an opaque topic for simpleflow. YMMV. At Botify we'd do it via the workflow input or via access to a datastore.

2/ Scheduling the activity task with some meta keys

This is already possible since Kubernetes support was merged. The concern here is to be able to pass some runtime configuration without changing the internal function signature, which can be a pain.

The activity task would receive:

{
    # args and kwargs are passed directly to the task
    "args": [ 1, 2, 3, 4 ],
    "kwargs": { "foo": "bar" },
    # meta is *not* passed to the task function, and not taken into account when calculating the task name
    "meta": {
        "binaries": {
            "mybin": "s3://a_bucket/path/to/mybin"
        }
    }
}

Only the binaries used by the given task should be specified.

On the decider side, the code would look like:

from simpleflow import activity, Workflow

def a_task_using_mybin():
    pass

class MyWorkflow(Workflow):
    def run(self, binaries_dict):
        # schedule a task with a custom version of "mybin"
        mybin_location = binaries_dict["mybin"] # "s3://a_bucket/path/to/mybin" for instance
        decorated = activity.with_attributes(a_task_using_mybin, meta={"binaries": {"mybin": mybin_location}})
        self.submit(decorated, ...)

As it's not convenient to decorate tasks lately like this, we'll find a solution to make the resolution of this lazy. We already did this elsewhere but I don't remember how.

3/ Activity worker: Lazily download the binary

Simpleflow will download those binaries if not present in a /usr/local/bin/<binary name>-<location hash>/ and prepend the directory to PATH before forking to process the activity task.

More complex strategies can be added later, but I like that the default one is just a { binary: location } dict (if you don't like it, tell me!)

jbbarth commented 6 years ago

As discussed with others, "/usr/local/bin" should be configurable. Some might prefer "/tmp", others "/opt", ... It only has to be configured/resolved on activity workers.

jbbarth commented 6 years ago

Done in #321