resource: track `parent` metadata at each vertex

SteVwonder commented 4 years ago

From a phone call discussion with @milroy:

In the propagate method introduced in #665, the only way to follow the ancestry of a resource vertex to the root of a hierarchy is by manipulating the path string of the vertex. It would be convenient code-wise if each vertex stored a pointer to its parent in the dominant hierarchy, so that walking to the root of the dominant hierarchy is straight-forward. This could be extended to support other hierarchies if necessary (at the cost of additional memory).

dongahn commented 4 years ago

In the propagate method introduced in #665, the only way to follow the ancestry of a resource vertex to the root of a hierarchy is by manipulating the path string of the vertex. It would be convenient code-wise if each vertex stored a pointer to its parent in the dominant hierarchy, so that walking to the root of the dominant hierarchy is straight-forward. This could be extended to support other hierarchies if necessary (at the cost of additional memory).

Thanks. I left a comment in @milroy's PR as well.

This can achieved if we always add bidirectional edges in the graph. This would be equivalent to your adding a pointer but would be a better graph-like code.

Our DFU traverser should always work even if you have bidirectional edges (thus cycles) because it uses graph coloring to break any cycle.

In fact, hwloc-reader-populated graph already adds both edges. The parent to child is called the contains edge. The child to parent is called the in edge.

Now, two things to pay attention w/ respect to this approach:

If you add a "filter" in your filtered graph to subscribe to only the contains edge, the other directional edge will be lost.
Always maintaining both edges will increase memory overheads (and a little bit of performance overhead) of dealing w/ the graph. But a bigger concern would be the resulting JGF would also become bigger.

Overall, I generally like the idea of maintaining both edges (and this is typical of other tree implementations including Linux rbtree) but we need to think about how to do this in a more memory/performance efficient fashion.

BTW, we did confirm you can't get to the parent when you only have the parent-child edge right?

SteVwonder commented 4 years ago

This can achieved if we always add bidirectional edges in the graph. This would be equivalent to your adding a pointer but would be a better graph-like code.

Sounds like a clean, generic way to handle this!

But a bigger concern would be the resulting JGF would also become bigger. Overall, I generally like the idea of maintaining both edges (and this is typical of other tree implementations including Linux rbtree) but we need to think about how to do this in a more memory/performance efficient fashion.

tl;dr: I agree that we can probably come up with some custom compression schemes to reduce the size of JGF.

I'm not sure how to reduce the in-memory footprint, but I wonder if we can cut down on the JGF size by not including the parent->child edge in the file format, and then generating those extra bi-directional edges during graph construction time. As a random example, maybe we have a configurable option for whenever you encounter an edge of name X ("contains"), insert an edge of name Y ("in") in the opposite direction.

EDIT - I just realized that if we are going to rely on these edges in propagate, then adding them back in can't be optional. Maybe JGF could include a new metadata field like "reciprocity" that does the same thing mentioned above (whenever you see an edge of X, insert an edge of Y). Sorry, I'm getting into the weeds here.

dongahn commented 4 years ago

I just realized that if we are going to rely on these edges in propagate, then adding them back in can't be optional.

We can use @milroy's current technique if the opposite edge doesn't exist...

I'm not sure how to reduce the in-memory footprint, but I wonder if we can cut down on the JGF size by not including the parent->child edge in the file format,

Yeah this will be an easy case to condense.

flux-framework / flux-sched

resource: track `parent` metadata at each vertex #682