flux-framework / flux-sched

Fluxion Graph-based Scheduler
GNU Lesser General Public License v3.0
84 stars 39 forks source link

feature: abstract class definition and aliases #1117

Open vsoch opened 5 months ago

vsoch commented 5 months ago

I didn't remember this at our meeting today, but I'd like to be able to support two different identifiers that mean the same thing. Let's go back to the ice cream shop! We can get cups or cones, and each can hold some number of scoops. Right now I have to represent them separately:

 <node id="shop">
      <data key="root">1</data>
     <data key="type">shop</data>
      <data key="basename">ice-cream-shop</data>
 </node>
 <node id="scoop">
     <data key="type">scoop</data>
     <data key="basename">scoop</data>
     <data key="size">4</data>
     <data key="unit">oz</data>
 </node>
 <node id="cone">
     <data key="type">cone</data>
     <data key="basename">cone</data>
     <data key="size">4</data>
     <data key="unit">oz</data>
</node>
 <node id="cup">
        <data key="type">cup</data>
        <data key="basename">cup</data>
 </node>

And that becomes problematic with edges, because now that "cup" and "cone" are technically two different things, I have to define relationships (like a cup or cone has 1-3 scoops) twice. I'd like to be able to define aliases, something like:

 <node id="holder" abstract="true">
     <data key="type">holder</data>
     <data key="size">4</data>
     <data key="unit">oz</data>
 </node>
 <node id="cone" alias_for="holder"></node>
 <node id="cup" alias_for="holder"></node>

That format is terrible (so feel free to blow it up) but hopefully you get the gist! Internally in the library "cone" and "cup" might be treated exactly the same, assuming we don't care about some limited quantity of actually different things.

@zekemorton and @milroy - there might be a way to already do this (and I forgot to ask) and if so, let's chat about that and we can close the issue. In a real world use case we would want to be able to say we accept two different names / aliases but treat them equivalently in the graph.

vsoch commented 5 months ago

Also, maybe "something in the computer world that is consumed and not given back" would be a virus? At a high level a parent hands a filesystem (or a targeted region from some scan) to a worker, and if the worker is successful it gives back a more "null" state (the same filesystem minus the virus).

In the real world, companies probably just wipe VMs or build fresh containers that are scanned and have minimal issues (e.g., Wolfi, and I suspect don't regularly run virus scanners like you would on a consumer system, because they want to vulnerability scan before going into production, and anything in production is ephemeral. I'm not read up on virus scanners, but I seem to remember they either look for exact matches (e.g., based on content hash) or heuristic (close to a match) to detect, and I think more sophisticated ones are based on system behavior. In a graph context, maybe you would start with some unit of scanning (a device? directory? network?) and then match to some pattern of virus that is known. If there is a match, then the (allocate?) would give it to the worker, and the worker would be tasked with removing or quarantining the virus, and handing back the (hopefully) virus free unit. If it's a lot of work, maybe it would make sense to distribute across workers in a graph? Now I just don't know. Cool/fun to think about though.