Closed acl-cqc closed 9 months ago
- For the single graph variable, we could
Alternatively, if there really is only one variable, if we add to the lower_bound
the total of all Constraint::Plus delta's on the path from each meta to that variable, then we have a partial solution: the "complete" "solution" is just obtained by taking union of partial-solutions for each meta with the value of the variable, which could be left to the caller.
(And/or: add an ExtensionId <the_root_node_variable>
that stands for any extra extension-ids on the root inputs)
I'm not really sure that there should be only one "variable" in a hugr. What if the hugr is a Module containing 3 dataflow graphs - then the root of each of those graphs should be a variable?
I'm not really sure that there should be only one "variable" in a hugr. What if the hugr is a Module containing 3 dataflow graphs - then the root of each of those graphs should be a variable?
Yes, that's a fair question - seems like a reasonable thing to do - I note merely that it's not what we do at the moment: https://github.com/CQCL/hugr/blob/b3062e9b8d0f081c1ce65295e30066aa16c39ee4/src/extension/infer.rs#L270-L273
So I think we can still handle items 1+2 first, without any consideration of variable(s), and then handle even multiple variables in stage 5 (e.g. passing an extra Map<Location,ExtensionSet>
into the topsort), without changing the overall approach. Moreover, I think we can do even that in a few stages / incremental changes.
I'm still thinking I'd like to change handling of DFGs such that the Input
node always has empty ExtensionSet; this guarantees computation of correct "delta", whereas current approach means the delta could underapproximate (by anything in input-extensions). The top-level FunctionDefn's in the Module would then have unused input-extensions, which TBH seems pretty reasonable. However there is a big downside in that the Constraint::Plus
then has to take two Metas (rather than a Meta and an ExtensionSet) - but I'm nervous as to whether the algorithm above would still work if we changed Constraint in that way.
So I think we can break this up into two themes: firstly, reordering steps in the algorithm; secondly, changing the approach to variables. The second is less certain and needs more thought, so on the first...
Following #591 I think all the parts are pretty much there, but the current flow is
main_loop
, which performs multiple iterations of merging equalities and solving constraints (generating solutions);instantiate_variables
which deals with variables and cycles of plus constraints, generating more solutionsmain_loop
I think what we can do is
The last step raises questions (allows to address issues) like, what if my meta m
is plus-dependent on both a variable v
and a solvable meta m2
(e.g. which has a fixed set of (input_)extensions specified in the Hugr) - if we don't have a value for v
, do we leave m
unsolved? or do we solve m
, and optionally report the constraint that places on the variable v
? (E.g. if m
has Constraint::Plus({E1,E2}, V)
then the value of v
must be a subset of m
and a superset of m \ {E1,E2}
.) This might want be part of a bigger rethink of variables, however.
Wrt. variables, we should probably do #657 first. Then we can explicitly declare variables of kind TypeParam::Extensions, rather than the somewhat implicit "open with no constraints" that we use in inference now.
This would allow returning an ExtensionSolution (that is complete, i.e. defined for all nodes) containing those variables, where necessary; and giving the highly-desirable property that instantiating the variables* and then performing inference, would produce the same answers as performing inference and then* instantiating.
** via the normal mechanism for any other type variable, i.e. substitute
- we would need to define this on Hugr values too, which we don't have at present, but we're gonna need that for Tierkreis eventually, so this isn't new but just made more urgent.
Note that if there are variables of kind Extensions
and annotations fixed/provided already, the latter will probably have to be in terms of the former, or else we will fail; the solution must be valid for all possible instantations of the variable. (Alternative would be to give TypeParam::Extensions
an upper and/or lower bound, allowing to deduce things like Constraint::Plus(Lower, Variable)
== Variable
and Constraint::Plus(Upper - Lower, Variable)
== Upper
.)
We might allow more variables than in #692 here too, e.g. allowing a Dfg-rooted Hugr to be "parametric" in some variable (perhaps the input extensions to the root node). However, assuming the variable has no upper/lower bounds specified, again for the solution to be valid for all values of that variable, we'll just have to put that variable into extension sets everywhere - exactly as if the "extension variable" were just another extension
(The neat trick might be to infer what upper/lower bounds for the variable would make the Hugr valid e.g. to fit in with existing annotations that don't mention it, but let's skip over this idea for now.)
So we may not need to give any special treatment to "variables" (as extension inference does now), just treat them as either (1) their own identifiers, or (2) if we want to be able to substitute
in concrete values for the variable, then a DeBruijn-indexed variable (much harder for Hugr-writers to use, note).
This means variables are their own thing (rather than phase in the algorithm). We could nonetheless have three or four phases: 1.. merge metas along Equal constraints,
Constraint::Plus(Extensions, Meta-with-solution)
gains a solution. For these Meta's, we have figured out at most one solution that might possibly be valid, so we could stop there; validation will complain if the lower_bounds
computed in step 2 are violated (for example - also if not all meta's have solutions, which is quite likely).Any remaining unsolved nodes require solving Plus constraints backwards, and so may have multiple possible solutions; there may be some element of policy as to which to pick. However the policy of "smallest-possible ExtensionSet" seems reasonable (!). Hence, we can (optionally?) continue:
lower_bounds
from step 2 - both backwards along Plus constraints (i.e. if A has Constraint::Plus(Exts, B)
and we have a solution or lower-bound for A, then the lower-bound for B must be at least A \ Exts
), as well as forwards (using lower-bound for B to compute lower-bound for A). Note that lower-bounds sets only increase; and that adding an extention(set) E to a set, can at most add E to its neighbours; so this must terminate. (However, bidirectional propagation-until-fixpoint seems the only option, we cannot process in a topsort: consider a bipartite graph with all Plus constraints from one half to the other, in the form of a zigzag, with one solution; this eventually affects all nodes, but must propagate forwards then backwards then forwards then back repeatedly zig-zag-zig-zag.)This might be enacted in stages as follows:
substitute
function on Hugrs paralleling that on Type(Arg/Param)s. Each call
node instantiating a FuncDe(fn/cl) may do so with ExtensionSet args that substitute into the FunctionType of the function (to give a different concrete delta, ok) and potentially into the body of the function, so (to avoid confusion if not worse) we should probably insist FuncDefn's and their Input-node children have empty input-extensions, but this might be sane anyway. We may also need to do inference on FuncDefn's in some kind of tree-traversal order rather than all together. There are a couple of further sub-options..
a. with FuncDefn/cl's able to refer to variables declared outside (as per current #692). This means that when inference infers a variable should be placed into an ExtensionSet on a particular node, we need to know what DeBruijn-index the variable has there, i.e. how many intervening binders they have, which makes this quite hard to use.
b. with "flat" FuncDefn/cl's that don't refer to enclosing variables. Thus, the call
s to these defn/cl's must pass any type-variables from enclosing scope. substitute
gets simpler, and the same variable has the same name everywhere. Of course we must therefore prevent inference from using an outer Extension-variable inside a function!substitute
that takes a Map<ExtensionId,ExtensionSet> - and we'll probably need a DeBruijn version later (for Tierkreis) as well :-(. However, this might make some things much simpler.I'm in agreement. We just need to plan the order in which we do things. I think:
Splitting EdgeKind::Static into Const/Func is #695 or another? I'm not sure that's necessarily the right solution to the problem
We probably also need to handle type-variables's in ExtensionSets even without using these for extension-inference's own notion of variables; I think this means (at the least?!) that inference mustn't propagate such a TV outside its binder.
We probably also need to handle type-variables's in ExtensionSets even without using these for extension-inference's own notion of variables
Agreed!
I think this means (at the least?!) that inference mustn't propagate such a TV outside its binder.
This is more of a head-scratcher. I think if we get to this point, it says there's something scarily non-parametric about the function!
I think if we get to this point, it says there's something scarily non-parametric about the function!
Yes!! ;-). So at the least, we should verify that any variables in any input_extensions
sets are "in scope" (in validate_subtree
, where we check TypeArgs for certain leaf ops are too). That'd catch anything apart from some weird awkward case where the variable had "leaked" out into another place that happened to have an identical variable (!).
However, if we restrict the input-extensions to empty for FuncDefn/cl, then the output-extensions are necessarily empty too (as delta is zero), and I think that should prevent any leakage from functions; the Static edge has PolyFuncType that would bind the variable, and any call
would instantiate that with some ExtensionSet from outside, so we're all good.
Polymorphic lambdas/constants is another issue (#709), which we don't even have yet ;), but if it's a Const containing a lambda then that's empty delta, so we say the input-extensions and output-extensions of the const are empty, and again anything bound by / inside the PolyFuncType on the static edge should be fine.
First a couple of notes:
main_loop
should not need to iterate more than once. It claimssolve_metas
can add new edges to the equality graph, but only if it findsConstraint::Equals
; a single iteration of merge_equal_metas should remove all Constraint::Equals and thus all edges from the equality graphSo, I propose removing/splitting main_loop, and the two calls to main_loop, as follows:
shunted
, new entries inconstraints
(i.e. keyed by the merged vars, but containing only Plus's), and a smaller set of variables (i.e.remaining
inmain_loop
- the new vars, and everything that wasn't shunted.)shunted
is a member ofUnificationContext
I suspect that remaining should be too. It'd be nice to useconstraints.keys()
but not sure whether that works(?)remaining
). Probably build that graph now. Variables on cycles can be shunted/merged; note they had no Equals constraints, but may still have Plus constraints pointing out of the cycle.tarjan_sccs
allows to return SCCs in an order (where cycle A depends upon cycle B, then B before A, say), we could solve the remaining constraints as we go here, but let's not.shunted
, make some more metas and move remaining non-cyclicconstraints
around, and build a new maplower_bound
(from merged meta to all Extensions on the cycle it was merged from).lower_bound
for existing meta, or "merge" the single meta into a new meta without the cycle, doesn't really matter).lower_bound
, or empty ExtensionSet otherwise.lower_bound
lower_bound
.Hugr-without-Lift-nodes ==> update validation to require extensions at source of edge are a subset of extensions of target of edge. Beyond change in step 5, I think that's all that's required.