PrefectHQ / prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
https://prefect.io
Apache License 2.0
17.5k stars 1.64k forks source link

Allow for volume mounts on Docker storage and / or Docker agent #2013

Closed cicdw closed 4 years ago

cicdw commented 4 years ago

We should allow for volume mounts to flows running with Docker storage. I believe this needs to be an option on the Docker agent (not the storage class), as it won't make sense in other environments (e.g., K8s).

This will allow for users to use e.g., LocalResultHandler within Docker storage and still write to local disk.

wagoodman commented 4 years ago

I'll move forward under the assumption that the agent owns volume attachment, signaled by labels. Specifically, the agent would mount volumes supplied at start up to all Flows the agent acts on, it would be up to the user to add labels to the agent at startup or flow at registration to select which flows to run on the agent:

# option a
prefect agent start docker --volume /some/path/to/x:/ctr/path --label provides-x-volume
... and ...
flow.register(labels=['provides-x-volume'], ...)

There is room for improvement by adding a label automatically based on volumes that the agent can provide. However, if the path is used for the label, then the Flow would need to know the exact host path that would be mounted, which would couple information unnecessarily (the Flow should not know about details of the mount, only that it is provided or not):

# option b
prefect agent start docker --volume /some/path/to/x:/ctr/path
... and ...
flow.register(labels=['/some/path/to/x'], ...)

This could be streamlined by having the user specify the label on the agent with the volume option, though the syntax could be conflated with docker's --volume option, which would not behave similarly (as this is not a named volume being created):

# option c

# add a label for the volume
prefect agent start docker --volume provides-x-volume:/some/path/to/x:/ctr/path
... and ...
flow.register(labels=['provides-x-volume'], ...)

# just add the volume with no label
prefect agent start docker --volume /some/path/to/x:/ctr/path
... and ...
flow.register(...) # no label necessary

I'll work on implementing option a, as it is the simplest and most explicit that achieves the goal (and we would not be prevented from adding the shorthand syntax later).

(Side note outside of the scope of this issue: it could be nice to also allow for mounting of Docker named volumes for folks wanting Docker volumes without explicit host mounts... If this is the case, then the above syntax for option C would not be possible as currently expressed)

wagoodman commented 4 years ago

One proposed amendment to my previous comment: I'll match that the --volume option on an agent to match the same options as the Docker daemon. In this way users can specify rw/ro modes, named volumes, leave off container mount paths, and it will behave in the same fashion as the Docker daemon would. So these would all be valid options:

# mount host path /some/path/to/x --> container path /some/path/to/x
prefect agent start docker --volume /some/path/to/x

# mount host path /some/path/to/x --> container path /custom/location
prefect agent start docker --volume /some/path/to/x:/custom/location

# read only mount
prefect agent start docker --volume /some/path/to/x:/custom/location:ro

# named volumes
prefect agent start docker --volume a-name:/some/path/to/x

This implies not implementing option C would no longer be possible the above format, however, this is OK given that exposing extra options such as read-only mounts is probably more desirable (and this is behavior is probably less surprising).