Open jameshcorbett opened 1 month ago
The system instance's resource graph has cluster -> rack -> node
. The JGF it writes out for child instances does not include rack vertices, however it still writes out the edges from cluster to rack and from rack to node. My current hypothesis is that the writer is coded to include the root of the graph but then skip any intermediate vertices on its way down to node vertices. Hopefully will be a simple fix?
Strangely, hetchy
does not have this problem, it writes out the rack
vertex. Something is off and since this is the same cluster as #1305 I wonder if the JGF is wrong somehow.
Could you pull an example json object from each of these? I'm looking at the RV1 code, and it doesn't have anything that would trim vertices. It's possible something in the match code is doing it, but something is clearly fishy here.
Some nodes hit the issue on the cluster, some don't. Here is the JGF for the overall system, and the JGF for one node that hit the error and another that didn't. bad_jgf_cluster.json cluster_R.json good_jgf_cluster.json
I didn't see any obvious errors in the system JGF but I may well have missed something.
It occurred to me looking at this yesterday that there's something we usually don't see in our graphs in here, the cluster-level graph has nodes with a rack and exactly one node that's directly under the cluster vertex. There's no reason that should cause a problem, but I'm not sure it's tested.
On rzadams, which was just today configured to use the
rv1
match format:I confirmed that vertex 1654 is not in the JGF produced for the scheduler, although 2196 is. 1654 is a rack vertex, 2196 is a node vertex.