flux-framework / rfc

Flux RFC project
https://flux-framework.readthedocs.io/projects/flux-rfc/
7 stars 13 forks source link

RCF14: add property support into V1 jobspec. #243

Open dongahn opened 4 years ago

dongahn commented 4 years ago

I just created https://github.com/flux-framework/flux-sched/issues/652 on the need for our graph scheduler to do resource property resource selection. The other side of this problem is how to specify one or more properties on certain resources in a jobspec. Is there already a way to do this with flux mini interface?

grondo commented 4 years ago

No.

Is this already specified for jobspec? (I don't think it is in the V1 specification at least) Maybe this needs to be an issue opened against RFC14? Once we have a way to specify properties in jobspec, we could figure if/how the flux mini commands could support it?

dongahn commented 4 years ago

No.

Thought so.

Is this already specified for jobspec? (I don't think it is in the V1 specification at least) Maybe this needs to be an issue opened against RFC14? Once we have a way to specify properties in jobspec, we could figure if/how the flux mini commands could support it?

OK. Moving this to RFC.

grondo commented 4 years ago

Some questsions:

dongahn commented 4 years ago

Will properties have values, e.g. will we want to match some numeric property with greater than, less than semantics?

Although RFC 4 doesn't define the property value type, the current implementation keeps the properties as a map of std::string.

My preference would be to keep this simple and only support string key-value pairs. If we need more sophistication, that would be a good time to think about whether it makes sense to hoist a property into a new "fixed-form" key in the formal resource pool schema (in RFC 4) as well as jobspec?

Will we need to logically match "any" of a list (OR), "all" (AND) or perform more complex property matching?

Yeah I think there is a value to this. However, I don't have specific near-term use cases that require "any" semantics just yet. Maybe as a short term we implement "all" semantics first and if any semantic is require, we can augment the spec.

BTW, now that I think about this we need to add this to both RFC14 as well as canonical spec.

grondo commented 4 years ago

My preference would be to keep this simple and only support string key-value pairs. If we need more sophistication, that would be a good time to think about whether it makes sense to hoist a property into a new "fixed-form" key in the formal resource pool schema (in RFC 4) as well as jobspec?

I apologize, I'm not sure what you mean by a "fixed-form" key.

It may be useful to keep our options open for numeric values for properties. One use case that might be common would be a HW or firmware version number of a resource. In that case, if code has a minium version requirement it could request resource type of widget with property FirmwareVersion >= 1.0. I realize this may be tricky to support, though.

Maybe we could define a property spec to be used in jobspec. In it simplest form, a property spec is a string "name", which resolves to "has property". O/w, a property spec is a dictionary, with the following keys:

Then, a new properties key can be specified as a list of property specs. Simplest form is a list of strings, which requires that the list of properties be present. O/w, a list of dictionaries with only name and value set requires properties of a given value:, e.g.

 properties:
  - name: "FirmwareVersion"
    value: 1.0

Later this could be extended by supporting "ops" other than ==.

Not sure the best way to extend this to support AND/OR. Maybe a special property and is implied between list elements, and later an OR could be supported. Or, the properties list could be split into any: and all: lists?

Sorry if this is not well considered, just a rough first idea.

dongahn commented 4 years ago

Simplest form is a list of strings, which requires that the list of properties be present.

I like this proposal. I don't think it requires lots of changes in terms how resource represent each resource vertex. Probably some changes to libjobspec but this shouldn't be too bad.

dongahn commented 4 years ago
  • name: - property name
    • value: - required property value
  • op: - optional operator: (default "==" or "eq"), could be <, <=, etc

Probably want to limit the value types to those that 'op' is well defined?

To begin with string (lexicographic order), integer, and float? Boolean seems like a good candidate too but op won't be fined well...

grondo commented 4 years ago

Probably want to limit the value types to those that 'op' is well defined?

This is why I admit the implementation may be tricky. A validator would need knowledge of all possible properties and their value types for each resource type in order to validate properties spec. Otherwise, it might be a mystery why the jobspec can't match anything. (As an aside, a useful service or library or utility for Flux might be a way for users crafting a jobspec to get a set of all resources in the current instance that could satisfy a jobspec)

To begin with string (lexicographic order), integer, and float? Boolean seems like a good candidate too but op won't be fined well...

Good point, it may be useful to have some way to implement logical NOT as well.

dongahn commented 4 years ago

This is why I admit the implementation may be tricky. A validator would need knowledge of all possible properties and their value types for each resource type in order to validate properties spec.

Yeah, this is going to back "fixed-form" vs. "free-form". Sorry I wasn't clear above. By "fixed-form", I meant the "fixed schema" portion with formal value type definitions. For example, "size" and "operator" would be an example. The "properties" field strikes me more as "free form" where you can add just about anything in there.

dongahn commented 4 years ago

@grondo: btw, we probably want break down this into two: one for RFC14 and the other canonical job spec RFC. We may put more complex specs into the canonical job spec RFC. But we need to be careful and implement only what's necessary in a near term for RFC14.

dongahn commented 4 years ago

(As an aside, a useful service or library or utility for Flux might be a way for users crafting a jobspec to get a set of all resources in the current instance that could satisfy a jobspec)

This would be computational not feasible because this means this service has to create every permutation of all possible matches.

Would an overall satisfiability check with matching resources still useful? (https://github.com/flux-framework/flux-sched/blob/master/resource/utilities/command.cpp#L47.)

If the jobspec requires sufficiently large number of resources, the result from match allocate_with_satisfiability will tell you many matching resources or tells you it is not satisfiable.

grondo commented 4 years ago

If the jobspec requires sufficiently large number of resources, the result from match allocate_with_satisfiability will tell you many matching resources or tells you it is not satisfiable.

Yes that is what I was thinking.

dongahn commented 4 years ago

A validator would need knowledge of all possible properties and their value types for each resource type in order to validate properties spec.

I did't plan to change the way the prosperities field of resource graph vertex are represented. I use a C++ map container with string key and string value, and thought I can still use it with this proposal. When the jobspec with a property spec comes in, I would know the value type of a property from the spec so I can easily cast the string value of the resource property into that type and apply op.

Still if this doesn't match, it won't match. But I didn't think that the role of validation would be whether the jobspec can be scheduled or not, but just checking if a jobspec is well formed.

grondo commented 4 years ago

But I didn't think that the role of validation would be whether the jobspec can be scheduled or not, but just checking if a jobspec is well formed.

Good point, I was (probably incorrectly) thinking that eventual validator could be extended by scheduler implementation to reject known unsatisfiable requests before they are actually presented to scheduler. I.e., in this case if a property typo was in the job request, then validator could reject with "no resources of type X have property Y".

This could be a slight usability improvement over a job waiting in the queue until it is scheduled, then gets an unsatisfiability exception.

grondo commented 4 years ago

btw, we probably want break down this into two: one for RFC14 and the other canonical job spec RFC.

@dongahn, RFC14 is the canonical jobspec RFC. RFC25 is the Jobspec V1 RFC.

Do you want to open an issue against RFC25 to amend with properties spec, with bare minumum requirements needed in near term?

dongahn commented 4 years ago

Good point, I was (probably incorrectly) thinking that eventual validator could be extended by scheduler implementation to reject known unsatisfiable requests before they are actually presented to scheduler. I.e., in this case if a property typo was in the job request, then validator could reject with "no resources of type X have property Y".

This could be a slight usability improvement over a job waiting in the queue until it is scheduled, then gets an unsatisfiability exception.

Ah. Yes this makes perfect sense. We chatted about a related topic at one of our coffee hours. We do have that "satifiability check at the graph scheduler so we can do this easily. I think the result of our discussion was we don't want to do this at the job ingest level because this means an RPC to resource and this will significantly reduce the job throughput.

Maybe we can extend job-manager - scheduler interface instead so job-manager can actually issue "satisfiability" RPC before alloc request to achieve this... Not all cases but for cases like system level scheduler.

dongahn commented 4 years ago

Do you want to open an issue against RFC25 to amend with properties spec, with bare minumum requirements needed in near term?

Opps. I got this backwards. Yes I will open up a new ticket against RCF25. This will be needed to support Trent's corona near term use case.

grondo commented 4 years ago

Maybe we can extend job-manager - scheduler interface instead so job-manager can actually issue "satisfiability" RPC before alloc request to achieve this... Not all cases but for cases like system level scheduler.

An original idea we had was to extend the validator script to allow "plugins" which could do anything, including calling out to a custom scheduler/resource "satisfiability" service. This is probably a trivial extension to the validator if such a service existed.

dongahn commented 4 years ago

An original idea we had was to extend the validator script to allow "plugins" which could do anything, including calling out to a custom scheduler/resource "satisfiability" service.

Yes probably a matter of adding a new request callback to resource using some of the existing functionality and hooking this to the plugin then.