Open sampsyo opened 4 months ago
I like the idea of primary and secondary paths. fud
also has a similar notion with the Path.also_do_path
method which specifies how a particular step requires a sequence of actions to generate a particular file. This is used when the symbolic stage provides a spec that needs to be compiled (in addition to the program needing to be compiled).
Is there still top-level discussion needed for this issue? If we have a resolution, can we mark this as "available" instead?
Going to take a go at implementing that with basically option 3:
The plan ignoring the --through
:
Represent the hypergraph as a weighted (with 0 and 1) bipartite digraph where nodes represent transitions or states (like as suggested in option 2).
Let the weights initially all be 1.
Order output states by some metric for which should go first (or simply try all permutations or randomize if number of states is small).
I probably will choose some arbitrary metric but it could even be user specified (but I probably won't do that unless it feels needed).
(In option 3, this metric is shortest path to a source node)
Chose the smallest state, then search backwards to find the shortest tree with all source leaves.
Now do repeat the process with the rest of the outputs.
--through
puts a bit of a wrench into the above so do this in addition:
Associate ops passed by --through
argument with a pair of sets, a subset of sources and a subset of targets.
We then basically perform the above algorithm twice, once to find the stuff passed by --through
and once to find the initial input.
Slightly more precisely, when searching backwards from one output, look for both ops nodes passed in by --through
instead of inputs.
Make sure to only look at the ops passed in by --through
if the output node is also contained in the associated output set.
Then once an op is found, search for sources from that. This is basically what is already done, but now there are multiple targets and sources. This means we get trees instead of normal paths.
This is certainly not optimal a lot of the time, but I think should work decently well when the multiple inputs have pretty separate paths. (source, I made it up)
Either way, it should be a good enough implementation to test out what the changes will look like. It shouldn't be too hard to swap out the path finding function later if this algorithm is too poorly behaved.
Just wanted to record here that we had a very brief discussion synchronously and this sounds like a great plan!!
Kinda seems like fud2 should become a hypergraph. It's currently a graph: vertices are states, and edges are operations (transformations between states). But the need for ops with two inputs has come up twice now, and the need for ops with two outputs (which may be harder, TBH) has come up once. That would mean that the edges in the graph are no longer edges but directed hyperedges.
The tracker (#1878) captures one use case, for multiple inputs:
In other words: fud2 currently has s simulate op, whose source is
simulator
(a pre-compiled Verilog simulator executable) and whose destination isdat
. We would like this op to have a second source: a new state calledsimdata
or something, which represents a directory full of hex data files. Then we'd have an op transforms fromdat
tosimdata
. So a simulation run would look like this:…or something roughly like that. You can see the hyperedge in there:
simulate
has two sources and one destination.(The other use case that came up recently, in conversation with @nathanielnrn and @eys29, is for YXI stuff. In that case, the end-user behavior is a little different: we only want to provide one input, rather than two as above. But we want that input to be transformed in two different ways and then recombined. Namely, we need to convert Calyx to YXI into an AXI wrapper, and also transform the original Calyx, and then combine those two Calyx files into one. That combination is a hyperedge.)
Anyhow, the big problem here is not so much with just representing the hypergraph (that seems simple enough) but with searching for the shortest path. It's important that users don't have to specify the complete path between two states; they can just give the input and output states (and possibly some
--through
ops to include) and fud2 finds a path for you. The trouble, of course, is that this wouldn't just be a path anymore—I think it's called a hyperpath, and it's not clear how to find these with a simple hypergraph search.Here are a few alternatives.
Option 1: Shortest Hyperpath Heuristic
We could read the papers out there about searching for shortest hyperpaths. I'm not entirely sure about the problem statement, and the heuristics available don't look simple. See this page on hyperpaths, for example.
Option 2: Greedy Hyperpath Search
How about this greedy algorithm?
First, we construct a normal graph by flattening the hypergraph. In this new directed graph, there are two kinds of vertices: states (which were vertices in the original hypergraph) and ops (which were hyperedges). Pre-compute the all-pairs shortest paths on this graph.
Given a single destination vertex $t$ and a set of source vertices $s_1, s_2, \dots$ (which are all state vertices), do the following:
It's probably far from optimal, but it might work?
Option 3: Distinguish Secondary Inputs
This one's due to @bcarlet. Maybe we don't want an actual hypergraph; instead, we want ops to keep having a single primary input and primary output state. But ops can also have secondary inputs that they also need. The search algorithm would look a little like the greedy algorithm above, but with less "select an arbitrary destination from the available destinations".
Given a single destination vertex $t$ and a set of source vertices $s_1, s_2, \dots$, do the following:
It is a little weird for ops to need to distinguish between "primary" and "secondary" inputs in this way, but it nonetheless might be a good idea. Maybe it would keep the algorithm more predictable too, if the way people want to think about this is already divided into primary and secondary inputs? In our example above, it seems natural enough that the program itself is a primary input and the data is a secondary input.
In all of this, it's not clear how
--through
would work for secondary paths. Maybe we don't need it.