ga4gh / task-execution-schemas

Apache License 2.0
80 stars 28 forks source link

Proposal: Remove stacked executors #59

Closed geoffjentry closed 5 years ago

geoffjentry commented 7 years ago

There was discussion on the mailing list about removing (or rebranding) the stacked executor functionality. Creating an issue to allow for further discussion here.

buchanae commented 7 years ago

Related mailing list discussions: https://groups.google.com/a/genomicsandhealth.org/forum/#!topic/ga4gh-dwg-containers-workflows/qFq_jgoRCvs https://groups.google.com/a/genomicsandhealth.org/forum/#!topic/ga4gh-dwg-containers-workflows/ccaqjysBvZY

buchanae commented 7 years ago

To summarize a couple points from the mailing list:

  1. There are concerns that stacked executors don't map well to concepts in Google and Amazon APIs, and implementing stacked executors in those systems could be difficult (or impossible?).
  2. The original motivation for stacked executors was to provide initialization and finalization steps, rather than to encourage optimization of workflow steps.
tetron commented 7 years ago

+1 on removing stacked executors. There are workarounds for initialzation/finalization that don't involve multiple containers.

buchanae commented 7 years ago

@tetron If you could give some short description of those workarounds, it might help the discussion. Please, thanks!

delagoya commented 6 years ago

@buchanae @tetron There were several options discussed on the call. Basically it boils down to the implementation detail of the back-end for a TES endpoint. The current setup is tuned for a backend that looks and acts like a traditional HPC cluster with a shared filesystem. It was also communicated on the calls that the intended usage was "stage data -> work -> checkpoint data" but the structure allows for "stage data -> work -> work -> work -> work -> work -> checkpoint data -> work -> checkpoint data" Not a great design.

Are are some perfectly good alternates:

1 - for HPC backend you have a shared FS, data staging taken care of already by volume mounts 2 - for individual serial workflows, just package the entire pipeline in a single container and make the script run each tool serially 3 - for a TES backend that works in the way that AWS Batch works, launch a bootstrap container that stages data as necessary and launches a sibling container to do the work.

buchanae commented 6 years ago

Thanks @delagoya

"stage data -> work -> work -> work -> work -> work -> checkpoint data -> work -> checkpoint data"

As I understand "checkpoint data" means "upload files to storage", but in Funnel at least, uploading only happens once at the end of the task. What am I misunderstanding?

buchanae commented 6 years ago

2 - for individual serial workflows, just package the entire pipeline in a single container and make the script run each tool serially

One use case is that init/finalization executors might be generated at runtime by the workflow engine (or other TES client). Also, in my experience, merging multiple containers into one has caused issues with maintainability.

mbookman commented 6 years ago

I really like this feature. I think it allows for users to have a set of well defined Docker images where today (based on the current Google Pipelines API model) they end up putting a whole bunch of tools into a single image.

We often see pipelines that, for example, act on BAM files. Copying a BAM file from Cloud Storage to Persistent Disk on the VM can take a significant amount of time depending on the size of the BAMs. Since it takes so long, people want to download once and then run multiple tools. For various reasons we see things like re-headering, re-indexing, re-creating BAM statistics. Sometimes these all use samtools and sometimes they end up using a combination of samtools, picard, and bamUtil.

Their current best option here is to create a single Docker image containing all 3 tools. With multiple executors, they could use stock published Docker images of each separate tool.

Another example would be implementing custom localization/delocalization. In dsub today, we implement "recursive" input/output handling because it is not available in the Pipelines API. To inject that functionality, either the user's Docker image needs to have gsutil installed or we need to install it into the Docker container at runtime. With the array of executors, we would be able to inject this custom localization step with a Docker image completely separate from the user's Docker image.

I would prefer to see that this feature is retained.

delagoya commented 6 years ago

We often see pipelines that, for example, act on BAM files. Copying a BAM file from Cloud Storage to Persistent Disk on the VM can take a significant amount of time depending on the size of the BAMs. Since it takes so long, people want to download once and then run multiple tools. For various reasons we see things like re-headering, re-indexing, re-creating BAM statistics. Sometimes these all use samtools and sometimes they end up using a combination of samtools, picard, and bamUtil.

This is my point exactly. What was meant to be a feature to allow for [stage data -> process data -> save data/report] for a single Task is being utilized in a non-standard way to encode a serial workflow.

mbookman commented 6 years ago

I'm not sure I understand the fundamental opposition. What defines a "workflow"? If I have a shell script that runs one command after another, did I just "encode a serial workflow"?

What I have described is that people are going to write such serial workflows anyway, for practical reasons, and they will do so whether multiple executors are provided or not. They are just going to do it in a less well structured way.

I think that multiple executors gives them a better way of what they are already doing, but - is there an alternative proposal that I missed?

Thanks!

delagoya commented 6 years ago

So here are my thoughts:

Defining workflow - a series of discrete steps that have chained input and output.

There is no way we can control what happens in the application space. People encode all sorts of suboptimal analyses all the time, the best we can do is to limit the choices they have to formally encode suboptimal solutions.

Let's take a look at your example, namely running a set of tools on input data in serial fashion. Certain tools will allow for multiple threads to be used, and some will not. The memory resources required by each tool will also differ. Java processes on SGE clusters with memory set as a consumable resource are notorious for needing to allocate 2X the memory that the process typically needs. This means that the instantiation of this Task will either waste resources if CPU / RAM is allocated for the greediest tool, or the runtime will increase due to lack of resources.

There is a trade-off either way. If the spirit of a Task Execution Service is scoped to a single unit of work, then the standard should reflect that. Allowing for the standard itself to encode a chained series of operations is not in the spirit of what is being proposed.

I totally get that the utility of it is nice to have when discussing a full analysis (workflow). Still not in favor of it because of the drawbacks it presents, in addition to the reason stated above.

At this point you may ask why I am for other additions like Task Dependency and Task Array. The reason is that both of these requested features are in the scope of an individual unit of work. Dependency specification is a easy one, it basically states that a Task has a prior condition that needs to be met before it can move from pending to running state. Arrays are collections of individual units of work, and ubiquitously used in task schedulers since they provide enough benefit in high-throughput scenarios to warrant including within the API design.

buchanae commented 6 years ago

I fall on both sides of the fence here :) Maybe that's useless, but here are my latest thoughts anyway.

I like the simplicity that comes from removing executors. It would remove an extra level of logs, and the volumes field which has confused multiple people. There's one less concept to explain. It might make it easier to support Kubernetes and Google Pipelines in Funnel.

On the other hand, executors exist and I think they're useful and not great, but not terrible either. As @mbookman said, people are going to encode multiple steps into a single task no matter what, but in this case they can't call docker directly, so we give them a replacement.

I wonder if task dependency could be the right compromise here. An implementation could use that information to reason about affinity and assign tasks to nodes efficiently. Or, an implementation could decide that this is too complicated and instead opt for the more simple yet less (network) efficient version of fully copying input/output files. Yet another implementation could decide it only supports shared, networked filesystems. Dependencies also add lots of value in other areas.

I can imagine an implementation for Funnel that uses dependencies to provide good data locality, but I couldn't promise that it won't end up being a lot of work in the long run. That unknown amount of work worries me, but maybe it's worth experimenting?

@mbookman would task dependency provide a good solution for dsub and users at Verily in general?

@delagoya In a world without multiple executors, what would you recommend as the best solution for AWS Batch? Elastic filesystem? Full copy of intermediate data? Probably the answer is "it depends", but wanted to get your thoughts anyway.

delagoya commented 6 years ago

Currently EFS scales with the amount of data placed on a EFS filesystem. I would not recommend it for deployments with less than 800GB-1TB. Best practice is to just stage to/from S3, use a single AZ FS (lustre, BeeGFS, GlusterFS, etc). I've also heard extremely good things about weka.io but do not have personal experience with it .

Task dependency + input/output file ids could be leveraged within an implementation, but there are all sorts of edge cases. I would not recommend that it be an official stance of this standards body to make that recommendation to implementors, but would love to highlight the different aspects or strengths of various implementations of the standard.

geoffjentry commented 6 years ago

@buchanae @delagoya Agree w/ both of you here.

I don't think that "a user might try to cram multiple things into a task" is an argument for stacked executors. For instance, as Cromwell - I'm never going to try to tease apart what someone's task Really Means to take advantage of stacked executors. OTOH I might do so in a case like what @buchanae describes where I can auto-detect multiple individual tasks as being stackable.

That said, I wouldn't expect that to be part of TES nor do I think that it should be. I don't see it as the place for these APIs to add in every potentially nice to have feature because all that means are either fewer implementations or more partial implementations, and if that occurs what have we really won? Perhaps in a world where we have an official scheme for partial compliance (i.e. client & server can negotiate what features are in effect), but not now.

I think it's totally acceptable to live in a world where if I want access to some whizbang feature on an underlying execution platform that I don't use TES and talk to the platform directly (or perhaps a hybrid where I'm sometimes using TES and sometimes not depending on the situation).

Lastly, I'd like to point out that we've gone from "This feature is really only for initialization & finalization, trust us, no one is going to use it for That Other Stuff" to assuming That Other Stuff is the main reason for this feature to exist. That right there seems like a strong counterargument as if we can't manage to avoid feature creep in 6 months, users aren't going to be able to do so.

kellrott commented 6 years ago

I think for the statement "a user might try to cram multiple things into a task" needs to define the 'user'. In my thinking the 'user' is actually the workflow engine. For those systems, there is often a need to introspect tool outputs before integrating them back into the data system the workflow engine interfaces with. For example, after running tools, Galaxy tends to run samtools index on BAM files, grab the top 100 lines of text files, count number of lines in a file and so on, all so they can inform their web interface with details about the results of job runs. There is no way they can assume that every tool will have enough of the dependencies to run https://github.com/galaxyproject/galaxy/blob/master/lib/galaxy_ext/metadata/set_metadata.py in the tools docker container. Being able to tag another command line onto to machine run, with all of the dependencies it needs, would be incredibly invaluable.

In terms of requirements for implementors, its a very low bar. Just put a for loop around synchronous docker calls. We already have this functionality working on Funnel as deployed on AWS Batch.

tetron commented 6 years ago

One compromise might be at most one container for each of a prologue / primary / epilogue.

The main challenge seems to be that if the underlying container orchestration systems don't support it, no amount of wishing is going to make it so.

delagoya commented 6 years ago

@kellrott You are right about my ambiguous use of "user". I meant "someone encoding a Task that will be submitted to a TES endpoint"

delagoya commented 6 years ago

Also for @kellrott comment "In terms of requirements for implementors, its a very low bar. Just put a for loop around synchronous docker calls. We already have this functionality working on Funnel as deployed on AWS Batch" see my comment about non-optimal resource utilization across the individual Executors. It's a trade off and not an acceptable one in my opinion.

buchanae commented 6 years ago

FWIW, we've mocked up what the changes would look like to remove multiple executors from Funnel. https://github.com/ohsu-comp-bio/funnel/pull/466

After more experience with tasks and workflow engines, I personally agree we should remove this concept from TES for the sake of spec and implementation simplicity. Note that having only a single executor might make supporting some Funnel backends easier, such as Kubernetes, Google Pipelines API, a less hacky AWS Batch backend, etc. Also, the task message itself is much more simple, and therefore easier to learn/teach.

One of the main use cases (a tool for workflow engine authors to use) isn't being utilized; workflow engine authors are just talking to the filesystem directly.

There are still some arguments in favor of keeping executors, which I understand (I think), but at this point I don't think it was worth the tradeoff in complexity.

mbookman commented 6 years ago

Note that the latest version of the Google Pipelines API (v2alpha1) goes in the opposite direction, adding multiple "Actions" to a single Pipeline:

https://cloud.google.com/genomics/reference/rest/v2alpha1/pipelines/run -> https://cloud.google.com/genomics/reference/rest/Shared.Types/Metadata#Pipeline

actions[]

object(Action https://cloud.google.com/genomics/reference/rest/Shared.Types/Action)

The list of actions to execute, in the order they are specified.

--> https://cloud.google.com/genomics/reference/rest/Shared.Types/Action

Action specifies a single action that runs a docker container.

-Matt

On Tue, Mar 27, 2018 at 3:07 PM Alex Buchanan notifications@github.com wrote:

FWIW, we've mocked up what the changes would look like to remove multiple executors from Funnel. ohsu-comp-bio/funnel#466 https://github.com/ohsu-comp-bio/funnel/pull/466

After more experience with tasks and workflow engines, I personally agree we should remove this concept from TES for the sake of spec and implementation simplicity. Note that having only a single executor might make supporting some Funnel backends easier, such as Kubernetes, Google Pipelines API, a less hacky AWS Batch backend, etc. Also, the task message itself is much more simple, and therefore easier to learn/teach.

One of the main use cases (a tool for workflow engine authors to use) isn't being utilized; workflow engine authors are just talking to the filesystem directly.

There are still some arguments in favor of keeping executors, which I understand (I think), but at this point I don't think it was worth the tradeoff in complexity.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ga4gh/task-execution-schemas/issues/59#issuecomment-376691906, or mute the thread https://github.com/notifications/unsubscribe-auth/AFjHtrQmnsF9IEwcsG8oZNutdZZbDRifks5tirg7gaJpZM4NDSaI .

buchanae commented 6 years ago

Enough time and debate has passed, and I think we all understand the tradeoffs. We need to make a decision and move forward.

I've created PR #108 to show what the spec looks like if executors are removed.

I'm in favor of removing this because:

It's been argued that this is simple to implement as a for-loop, but that limits implementations to wrapping all commands in some sort of implementation-specific worker (such as requiring the Funnel worker binary or a Bash script wrapper).

I really think a TaskSequence API is the right place for this. Such an API would guarantee that a list of tasks are executed sequentially (hopefully on the same machine). The presence of that API signals that an implementation supports this advanced feature. I can imagine a clear and relatively straightforward way to implement that in Funnel.

geoffjentry commented 6 years ago

@buchanae Considering the cloud workstream is only supposed to be doing things in service to the driver projects, which driver projects have opinions on this functionality and what is their opinion?

erikvdbergh commented 6 years ago

We are currently working on CWL compatibility with our TESK implementation. What we are struggling with is the fact that cwltool by default makes separate steps into separate tasks. However, the underlying pattern and assumption is that the separate steps run under the same underlying storage, so having them together is a caching strategy. This could be reflected by modifying cwltool to create multiple executors instead.

IIRC, the conversation around multiple executors last time ended with exactly this, namely that it is a form of caching by linking together steps that are bound by the same data. Introducing a TaskSequence API for this would be overkill IMO, especially considering not everyone is convinced that there should even be a WES/TES split.

For us, we would not be in favor of dropping this, because the practicalities surrounding current implementations would seriously favor a mechanism to link steps together in this way. Also, yes @geoffjentry I would love to hear what the driver projects think.

geoffjentry commented 6 years ago

@erikvdbergh I believe you all are the driver project interested in TES since everyone else is pinging the WES level. I think at the moment the key stakeholders would be EBI and whomever is utilizing TES to empower the 4x4 testbed in Basel, but I could be wrong.

delagoya commented 6 years ago

Yeah, I am still of the opinion that it is not a good design choice, but will defer to the current implementor's opinions.

geoffjentry commented 6 years ago

Personally I've come around on the idea (as a setup/teardown mechanism only) after seeing it from our PAPI v2 implementation.

From a GA4GH perspective, as one of the WES implementers for the Basel 4x4, the only use TES we'd conceivably be supporting in that timeframe are the EBI folks, so I'll defer to them.

delagoya commented 6 years ago

@geoffjentry Key words being "as a setup/teardown mechanism only". This is not inconsistent to my original argument that if the mechanism was created for this case, and there is now a secondary use being applied, that the original structure was not correctly named or scoped.

But (again) I am being outvoted here, so if the implementations are already there that assume a multi-step process that may be a serial set of steps beyond setup->run->teardown then so be it.

delagoya commented 5 years ago

Closing issue.

The general consensus was to keep the array of executors in the schema.