ga4gh / task-execution-schemas

Apache License 2.0
82 stars 29 forks source link

Define custom resource request schema #84

Open buchanae opened 7 years ago

buchanae commented 7 years ago

There's no place to request some types of resources:

Also, some existing concepts in the Resources schema are not clear

A proposed solution is to remove some concepts from the schema, such as preemptible and zones, and add a new field of type map[string]string to allow more flexible, implementation-specific resource requests.

pditommaso commented 7 years ago

My feeling is that this kind of requests (eg. CPU architecture, spot instance, bidding strategy, etc) should be delegated to a queue concept which could be (optionally) specified by a task.

geoffjentry commented 7 years ago

Something we've run into over time is that there's a tension between a desire for generic key/values that an execution backend can feel free to interpret and people's desire for something to work the same no matter where they go. This is less bad for concepts which don't map cleanly at all but once you start getting to things which are semi-shared it becomes problematic.

If I had a good solution to this problem I'd have solved it already, but worth pointing out.

buchanae commented 7 years ago

My feeling is that this kind of requests (eg. CPU architecture, spot instance, bidding strategy, etc) should be delegated to a queue concept which could be (optionally) specified by a task.

If using queue names with a large workflow, would each step need to define its queue? If you moved that workflow to another cluster, would you have to edit all those queue names?

I guess, alternatively, one might argue that it's the workflow engine's responsibility to map workflow task resource requests into TES queue names. That has some tradeoffs:

Something we've run into over time is that there's a tension between a desire for generic key/values that an execution backend can feel free to interpret and people's desire for something to work the same no matter where they go. This is less bad for concepts which don't map cleanly at all but once you start getting to things which are semi-shared it becomes problematic.

This all reminds me a bit of XML namespaces, where you can reference the vocabulary you'll be using for a set of terms. I wonder if we can learn something from that.

pditommaso commented 7 years ago

If using queue names with a large workflow, would each step need to define its queue? If you moved that workflow to another cluster, would you have to edit all those queue names?

Won't be the same problem having to define for each task the CPU architecture, spot/on-demand, bidding strategy, etc, only much uglier?

Also the queue is a concept available in any batch scheduler, including the new generation, I mean for example AWS Batch.

To avoid to specify the same queue for each task, it could be possible to define a default queue at workflow or executor level. Then only a subset of tasks would need to specify a different one.

It's realistic to expect that some class of resources to only be available in some cluster architecture, for example spot instances are only available in a cloud cluster, while do not apply to legacy batch schedulers. For this reason, I think having such level of indirection would make the task definition, and begin so also the workflow, more portable.

buchanae commented 7 years ago

@pditommaso good points. Would the queue field be a single string? Would the Resources type still exist?

pditommaso commented 7 years ago

One possibility is to add queue as a string field in the Resources type and remove preemptible and zones.

kellrott commented 6 years ago

Introducing the idea of queues seems a little problematic. Assuming a user is on a AWS batch backed system, but they are not the sys-admin, if there was no GPU based queue set up by their admin, there is no way to create a GPU machine queue (at least not defined in the common API) and issue jobs on it.

By providing a vendor tag field, there is at least a possible mechanism to issue directives that the system then uses when it tries to identify resources.

Basically, I feel like the idea of queues essentially takes the problem of custom resource requests and shoves it into a non-standardized API. Having a custom tag field in the TES schema may be non-standard, but its a matter of figuring out which vendor strings you want to append vs figuring out which additional API endpoint you would have to call out to in order to create a new queue.

pditommaso commented 6 years ago

By providing a vendor tag field, there is at least a possible mechanism to issue directives that the system then uses when it tries to identify resources.

I'm not against a tag field but I'm trying to see it in practice. For example, let's say there's a tag instanceType that allow to specify a cloud VM type. How that would work in practice? Each cloud provider defines it's own VM type identifier, hence there should be a instance type tag for each provider, shouldn't be?

buchanae commented 6 years ago

Tags have values, so it might look like:

{
  "tags": {
    "gce-instance-type": "n1-standard-1"
  }
}
buchanae commented 6 years ago

Oh, sorry, I misunderstood your question. You're asking whether we should have gce-instance-type and aws-instance-type vs just instance-type, right?

It's more difficult to find a common ground between all the providers, so I'd lean towards have provider-specific terms.

pditommaso commented 6 years ago

Yes, exactly there should gce-instance-type, aws-instance-type, azure-instance-type .. etc.

Not only for each cloud platform but also for different services. To remain in the AWS Batch example, a Batch job would not be able to use a aws-instance-typetag but it would require a specific tag eg. aws-batch-queue, etc.

Also it would the client create a request including all the tags, independently the target platform/service?

geoffjentry commented 6 years ago

I might be misunderstanding but if we're backing e.g. AWS aware fields into TES doesn't that kinda limit who can be implementing it?

buchanae commented 6 years ago

Here's what I propose:

  1. Remove the zones field: it's not clear what it means, and is platform-specific.
  2. Remove the preemptible field: it's not a universal concept.
  3. Add Resources.tags field for resources which are implementation-specific.

There was some discussion about whether to reuse the Task.tags field for generic resource requests. The thought is that it's better not to mix concerns. Task.tags are for user defined tags, Resources.tags are for implementation-defined resources.

There's an open question about the type of this field.

  1. map<string, string> is the easy choice. Implementations would be required to interpret and validate the string value.

  2. protobuf's Any type is interesting. It allows the value to include a URL naming its type, allowing this field to use well-defined schemas which are defined externally. However, given our experience with other complex protobuf types, we (the Funnel team) worry that we might run into issues with limited or unfriendly support in protobuf libraries.

We'd also like to collect some use cases about how people are actually using resources. @geoffjentry and @pditommaso can you share any details about the types of resources your users are requesting?

pditommaso commented 6 years ago

The most common ones with legacy batch schedulers are:

The problem that each schedulers implement all kind of perversions with the parameters, so it's very difficult to have an exhaustive list. For example generally the request for each of the above resources can be soft (better to fulfil but not mandatory) or hard (the scheduler enforce it). Also for the memory you can specify to check the rss memory or the free memory or the virtual memory. With the cpus you can bind the execution to some specify cores (cpu affinity concept), etc.

What we do with NF is to provide an abstraction to the basic ones (cpus, memory, storage, time and queue) and leave the user to provide cluster specific settings via a free form option that is passed as it is to the underlying scheduler.

emi80 commented 6 years ago

Hi all,

to introduce myself, I am in the Nextflow team and also taking part in the GA4GH Large Scale Genomics work stream.

I tend to believe that implementation-specific resources should be managed by the workflow engine itself, mapping TES queue names to resources. @buchanae I see the tradeoffs you mentioned but, as @pditommaso commented, we can expect some class of resources to be only available in some architecture thus making the abstraction of implementation details from TES more reasonable.

I completely agree to remove preemptible since it is not a universal concept. Though, I would replace zones with queue (or any synonym that makes sense) to specify some kind of 'computation profiles' containing the configuration of implementation-specific resources.

buchanae commented 6 years ago

I've hacked up an proof-of-concept of a potential solution in the Funnel code here: https://github.com/ohsu-comp-bio/funnel/pull/484

This solution uses the "Any" type in protobuf. This allows anyone to define a message type to describe the resource request description that works for them. That encourages all the benefits of schemas (documentation, validation, sharing, reuse, etc) while also being flexible.

Here's an example JSON task document using a custom request type: https://github.com/ohsu-comp-bio/funnel/pull/484/files#diff-5b5299c8a8e64d23d450b17ef9b8fcec

The user defines the "@type" field to describe what type of schema this is. Generated protobuf code provides builtin resolving of that type name (based on the protobuf package + message name). Type don't need to be defined in the TES schemas, anyone can define and publish them.

Most things in Funnel just worked, and I was happy with the code I needed to add. Here's an example of how to get from the generic Any type to the concrete type: https://github.com/ohsu-comp-bio/funnel/pull/484/files#diff-f1634cf4b5d7de6021802103be25042eR142

I think this also frees up Funnel to work on improving the supported resource requests and experiment without needing to block on the TES standards process.

Question: given these message types, does it even make sense to keep cpu_cores, ram_gb, etc? As mentioned earlier, sometimes even cpu_cores can be defined differently by different infrastructures and admins (cores vs shares vs etc)

buchanae commented 6 years ago

I forgot to link to the changes to the schema in that PR: https://github.com/ohsu-comp-bio/funnel/pull/484/files#diff-f6f9f7d5e2c992682842325d68f5e7bf

pditommaso commented 6 years ago

Question: given these message types, does it even make sense to keep cpu_cores, ram_gb, etc? As mentioned earlier, sometimes even cpu_cores can be defined differently by different infrastructures and admins (cores vs shares vs etc)

I tend to disagree and I'm bit worried about the direction is taking this proposal. My understanding is that the TES api is meant to provide a uniform abstraction irrespective the target execution backend, in order to allow a workflow engine and ultimately workflow applications to be deployed and executed in a portable manner across heterogeneous execution platforms.

Now, having a different resource format for each different execution backends ie. local, slurm, grid, engine, aws, azure, etc. would mean that the application submitting tasks via TES should know the actual execution platform on which the workflow is running (also take in consideration there could be complete different execution service in each cloud provider..).

IMO this breaks the API contract and makes the resulting applications not portable (unless they do not specify the required resources for all possible platforms, which is clearly not possible).

Also my understanding is that the TES server implementation should take care to define the semantic of the resource request tags, this means that at some point there could be different implementation for the same platform using different request tags.

I think this would result quickly into many different implementations unable to interoperate each other.

buchanae commented 6 years ago

We can still recommend a resource request schema for common types (cpu, ram, disk, etc) while also allowing an escape hatch for the more implementation/deployment-specific resources.

Similarly, an implementation could provide a custom resource schema that emphasizes the use of queues: e.g. { queue: <string> }. I think queues could be a good way for deployments to organize groups of resources, and it's a common pattern, but I don't think TES should try to decide that it's the one true way.

Deployment-specific resource requests inevitable, and common. If we don't give users a place to put those requests, they will hack them into tags. We should embrace the complexity and do what we can: recommend a specific field for custom resource requests, and strongly recommend that people document the schema of those custom requests.

Let's not get stuck on trying to find the perfect resource request schema that we can get the whole world on board with.

david4096 commented 6 years ago

I'll agree with @geoffjentry and @pditommaso. Instead of trying to specify all of the details different engines would expect, we opted to provide an endpoint that details the possible parameters and defaults for a given engine. Then, when clients go to make a execution request, there are optional named workflow_engine_parameters that can be provided.

https://github.com/ga4gh/workflow-execution-service-schemas/pull/14/commits/5c8ab3af85d9f78ea61fca91bcc2b2c025da953b

The idea is that each engine will offer different flags, and the best we can do is hope to pass down the arguments and their defaults when clients ask for it. That way, one can gradually build up more complicated queries by inspecting the default_workflow_engine_parameters (I know, it's long) and building up their workflow_engine_parameters query. This is in exactly the same spirit as --help for running things from the command line, and should hopefully be a straightforward passing of arguments!

You'll note that this approach creates some complexity to the client. workflow_engine_params might not be reusable against different engines or service instances. This is where sane defaults and helpful error messages take over, you'll have to remove your params from your request if you want to rerun using a different service.

However, I hope that after some time of writing client code against it, there will be natural fields that rise up that demonstrate a shared data model. I don't think it makes sense to ever include aws instance flavor in these API schematizations, but in the end, mapping to them is a large part of the practical task.

There are few ontologies for describing computer resources that might be used in the process, but it is at least an improvement over data modeling from nothing. A well curated computer hardware ontology would be a natural way to enforce a controlled vocabulary without letting the details pollute your API schematization. That way the request just says, "use a list of terms from the computer hardware components ontology" and then it is up to implementors how well they parse that list and provide feedback to clients about whether their keys were used in instantiating the resource.

http://ga4gh.github.io/workflow-execution-service-schemas/

aniewielska commented 3 years ago

Quite a lot of time passed and the discussion failed to conclude. We have #127 opened with a request to make a final (and quick) decision on custom resources. There seemed to be 2 contradicting options considered: either passthrough of custom resources to the backend and implementations deciding which and how to use or introducing the concept of queues and hiding computational profiles for those backends that do not implement queues directly behind queues.

My problem with queues is that I do not see how it it makes the workflow more portable and not even more backend specific (Does a user have to know the names of queues to use? Are those harmonised between different backends to make the pipeline portable or do the pipelines have to be rewritten for different backends? Can the computational profiles change making the pipeline not reproducible? ). Custom, implementation specific resources can at least convey direct intention of what kind of resources are needed, even if understood only by some backends. To conclude, I am supportive of the initial idea of having an additional field, likely of type map[string]string such as proposed in #127. I would like to hear what others think about it now and if any of the views changed.