flux-framework / flux-sched

Fluxion Graph-based Scheduler
GNU Lesser General Public License v3.0
89 stars 41 forks source link

Condensing JGF #1255

Open jameshcorbett opened 4 months ago

jameshcorbett commented 4 months ago

JGF is verbose, and Rabbit-y JGF on elcap systems can become very large. We discussed offline several ways to shrink JGF, both while maintaining the same format and compatibility with the standard (which?) and breaking the standard to achieve greater reductions in size.

Performance tests are necessary to see just how bad the problems are, so we can decide how radical of changes we need to make.

trws commented 3 months ago

To get this going, here's the JGF specification website: https://jsongraphformat.info/

We're currently using JGF v1, and I don't think we necessarily want to switch to v2 unless there's a material win for doing so. The main things that I can think of (@milroy I'm pretty sure I'm missing at least one of your suggestions, please correct me) that we've discussed are:

  1. we send a lot of default data over the wire. Here's a vertex in flux JGF:

    {
     "id": "2",
     "metadata": {
       "type": "node",
       "basename": "node",
       "name": "node0",
       "id": 0,
       "uniq_id": 2,
       "rank": -1,
       "exclusive": false,
       "unit": "",
       "size": 1,
       "paths": {
         "containment": "/tiny0/rack0/node0"
       }
     }
    },

    Just off top of my head:

    1. no need to send default values
      1. basename defaults to type
      2. name defaults to {type}{id}
      3. rank defaults to -1 when resource doesn't belong to a rank, we could also use -2 to say "same as ancestor" to avoid sending it in almost all cases
      4. exclusive is almost never used except in allocated JGF, and then defaults to false
      5. unit is almost never used, and defaults to "each" or "empty"
      6. size is almost always 1, except for ram
      7. paths can be calculated entirely from edges, which we're sending separately anyway as long as we guarantee that jgf always includes all ancestor vertices up to the root
      8. uniq_id always matches the outer "id" field
    2. the inner "id" is the logical id under the parent vertex, if we have the uniq_id, not sure we need that

    With just that, we could send this as essentially equivalent:

    {
     "id": "2",
     "metadata": {
       "type": "node",
     }
    },

    Edges are a bit harder, but since we're using a bidirectional graph finally we should be able to remove the value of the name object, and containment would be the default, so we could take these:

    {
     "source": "2",
     "target": "4",
     "metadata": {
       "name": {
         "containment": "contains"
       }
     }
    },
    {
     "source": "1",
     "target": "5",
     "metadata": {
       "name": {
         "power": "supplies_to"
       }
     }
    },

    and make them

    {
     "source": "2",
     "target": "4",
    },
    {
     "source": "1",
     "target": "5",
     "metadata": {
       "type": "power"
     }
    },

    All of that is still JGF compatible

  2. Structural sharing, or expansion like GRUG does, as @milroy noted could help a lot. JGF doesn't support this natively, but we could use the metadata to include the information, or use subgraphs with references. Upside to this is the vast majority of our vertices are identical below some point, so we could take say a node, make an exemplar graph as a separate graph, then have a node that says "template these based on this hostlist" or something and we're sending relatively little more than with RV1
  3. If we're willing to throw JGF the spec out the window, the cheapest, easiest win would be to switch the edge storage from an array of objects with keys to an array of tuple-like arrays like this:
    [
     // [source, target, (optional) type]
     [2,4],
     [1,5,"power"]
    ]

    Or even switch to a stream of json objects like some other formats use so we can do the same to vertices and process them without having to build the whole object in memory, but this will break any and all JGF parsers.

trws commented 3 months ago

CC: @garlick, @grondo

jameshcorbett commented 3 months ago

As I understand it I'm on the hook to try out elcap-sized JGF and see what the performance is like, which I intend to do as soon as I get some pressing rabbit issues out of the way.

jameshcorbett commented 2 months ago

OK so on a representative system:

Before the changes introduced by #1293 and #1297, JGF was 8458557 bytes. After the changes in those two PRs, it dropped to 2122574 bytes. If the path field of vertices could be dropped in all cases (currently it has to be supplied in all cases), that would shrink it to 1298046 bytes. That makes it around 1/7 the original size. Not bad.

jameshcorbett commented 1 month ago

If #1299 goes in, the last obvious work I'm aware of will be to drop each vertex's .metadata.paths.containment which in theory should be simple, however I'm not quite sure how to actually implement it. Also, there is some code that fetches all the paths to SSD vertices in JGF in order to be able to mark the vertices as UP or DOWN via set-status RPCs so that would need some sort of replacement.

jameshcorbett commented 1 month ago

On Hetchy, JGF for jobs appears to be around 15K per node.