feat: Allow extension operations to have child graphs

ss2165 commented 1 month ago

We originally did not allow this for simplicity, but there a number of applications where this would be very convenient.

A motivating example: a "dagger" box that contains a unitary subgraph and means "evaluate the dagger of this graph". A rewrite may dagger the inner graph at compile time to eliminate the dagger node and inline.

Guppy can make heavy use of such extension ops to provide abstractions like compute/uncompute, quantum controlled subroutines, etc.

While making the changes from #1433 seems to be a good time to consider this.

Relatedly, it would be beneficial to unify the encoding of all statically available hugrs (i.e. inside dfg, control flow nodes, function definitions, FunctionValue constants, and now also inside custom operations).

Open questions include whether the child graph of a custom op is always a dataflow graph, or it can include other region types like cfg, module or the one insideConditional.

croyzor commented 1 month ago

Could your example be done equally well with a const hugr wired into an opaque "Dagger" op?

ss2165 commented 1 month ago

yes I think there are multiple possible encodings with the current expressivity but for static subroutines I would say allowing children is the most natural. For example with your case the rewrites would fail if something earlier had manager to insert something between the constant and the op, assuming it was a runtime value.

zrho commented 1 month ago

I think this is an essential feature, as it opens up a lot of use cases (see https://github.com/CQCL/hugr/discussions/1155#discussioncomment-9678673).

After our conversation at confinuum, I've made the model data structure https://github.com/CQCL/hugr/pull/1542 allow for this. A node can have a sequence of regions, each of which is currently either a dataflow region or a control flow region. This works well with the builtins, and is converted appropriately between core and model:

The cfg operation expects a control flow region.
The block operation expects a data flow region.
The dfg operation expects a data flow region.
The define-func operation expects a data flow region.
The tail-loop operation expects a data flow region.
The conditional operation expects as many data flow regions as there are cases in the sum.

This design also addresses the issue we had with order: on the one hand, the nodes in a dataflow graph do not have any intrinsic order between them. On the other hand, conditional operations need ordered cases. Since the cases here are regions they are ordered, while nodes remain order-agnostic within their region.

Each region comes with source and target ports, which are the ports internal to the region. That removes the need for input, output and exit nodes (and potential entry nodes), therefore also removing reliance on node order within a region.

(This is inspired by a mixture of MLIR and functor boxes for string diagrams; in particular the multi part ones that some applied category theory have been using for quantum supermaps)

doug-q commented 1 month ago

Clearly conceptually ops with child graphs make sense, dagger op being a great example. When pretty-printed it should look like they are children.

Currently we would model this by storing a Hugr inside the Dagger ExtensionOp. In effect, traversals of the graph which aren't looking for specifically Dagger will not see the internal Hugr at all. The children of Dagger have no access to FuncDefns, Consts, non-local edges in/from the parent Hugr.

This kind of opacity is exactly what you want when dealing with unknown parent ops. There is very little you can do with children of an unknown op, because you have no idea what the semantics of "being a child" is, so excluding them from the Hugr is great. The children might be running on a different target, where the parent's FuncDefns are unavailable, so their being unavailable is great.

I am strongly against this proposal. It's adding a lot of complexity in a lot of places for no gain. Instead we should make Ops embedding Hugrs more ergonomic, perhaps by allowing extension ops to take Const edges. Then the interior ops would not be opaque, so you can get to them for constant folding etc if you need to, but they are not in the graph proper, where they would cause nothing but trouble.

ss2165 commented 1 month ago

Currently we would model this by storing a Hugr inside the Dagger ExtensionOp

It's not clear to me how I would do this?

doug-q commented 1 month ago

Currently we would model this by storing a Hugr inside the Dagger ExtensionOp

It's not clear to me how I would do this?

A type arg with a string which contains a serialised Hugr.

ss2165 commented 1 month ago

That would be quite an ugly hack for what should be a well supported usage pattern. I'm happy to consider allowing const edges to extension ops to be the core implementation and keep the general child pattern to the model as Lukas outlines above.

I like the arguments about some degree of invisibility to the host graph being potentially a good thing.

we should make Ops embedding Hugrs more ergonomic

this is ultimately the only goal of the proposal, happy to tweak how we get there before beginning work

zrho commented 1 month ago

it's adding complexity for no gain

I'm surprised by that, since to me it appears the exact opposite, lowering complexity while simultaneously giving massive gain.

Sure, some thought might have to be put into scoping rules. But by passing in some entirely separate Hugr, any custom operation taking a sub hugr that shouldn't be isolated (just like tail-loop or cond isn't) would have to be linked somehow. That's more complicated. Moreover availability of operations can also be controlled via the extension sets.

Similarly rewrites: A lot of incremental lowering can be performed by local rewrites. There's a bunch of lowering procedures that can be expressed with a box that is incrementally made smaller from the boundary by local rewrites. For instance this can be done for autodiff or incrementalisation. It's just like the process of computing a differential of an expression by hand: you add a d/dx(...) around your expression and then push it inwards step by step until you end up with a derivative-free expression again. But this requires rewrites that can operate across nesting boundaries. That is certainly possible with isolated hugr instances passed in as props, but that would be extremely complicated.

This is exactly the feature that makes Ops embedding Hugrs more ergonomic. And if you want to treat a subgraph as opaque, you're free to never look at its subgraph. To prevent non-local edges we could have an isolation system similar to https://mlir.llvm.org/docs/LangRef/#value-scoping.

A type arg with a string which contains a serialised Hugr.

The only situation where this is not much more complex to deal with is when you never actually intend to look at the contained hugr. There is a reason why we do not do this for loops or cfgs.

doug-q commented 1 month ago

Sure, some thought might have to be put into scoping rules. But by passing in some entirely separate Hugr, any custom operation taking a sub hugr that shouldn't be isolated (just like tail-loop or cond isn't) would have to be linked somehow. That's more complicated. Moreover availability of operations can also be controlled via the extension sets.

The alternative is to have extension ops with children, some of which are allowed to call external functions and some of them aren't. In other words, some are implicitly linked to the rest of the hugr (like DFGs are) and some aren't. The complexity is in this variation of behaviour. Similarly for other edges between exterior and child nodes of an extension op: how can these be understood? Only by some declaration on the extension op.

That is certainly possible with isolated hugr instances passed in as props, but that would be extremely complicated.

I don't think this claim is justified. I think it would be much the same.

As the HUGR is now, we can understand entirely the dataflow, control flow, and static call graph. This is important for writing "interpretations" of the HUGR, in which I include analyses, rewrites and other transforms, and destructuring. Allowing children of extension ops, and in particular edges between those children and nodes outside the op, means every interpretation has to deal with this case:

Should I apply this rewrite under this extension op?
Which functions are called from my function? Should I include children of this extension op?
How many uses does this port have? It is used non-locally under an extension op.
What extensions are allowed under this extension op? They are not necessarily related to the extensions available to the parent.
When I'm lowering one extension to another, should that happen under this extension op?

The answer to each question is "it depends". It's different for every extension op and every interpretation. We get an expression problem, where new interpretations have to ask new questions of children of extension ops. Of course interpretations do already have to deal with extension ops, but this is much simpler than dealing with their children, because the issues above cannot arise.

By constraining the hierarchy of the HUGR to a closed set of "parent" nodes, we make the constructed HUGR more useful, because we can know more about it. Of course this then constrains how you can define your ops and map the semantics you want into HUGR, but if you CAN get your semantics in a reasonable way (by which I mean a number of ops linear in your program size) I think this is good enough. It doesn't have to be pretty, it is written by computers for computers.

zrho commented 1 month ago

By constraining the hierarchy of the HUGR to a closed set of "parent" nodes, we make the constructed HUGR more useful, because we can know more about it.

It is more useful to you since you appear to go for a form of LLVM, but it becomes unwieldy to be point of being useless for applications that go beyond that. To me the idea of HUGR has been to be able to explore various kinds of exotic modes of computation under one roof. This is not just academic curiosity, but is concretely relevant for a research compiler in quantum computing. Striking the H out of HUGR also strikes out the U, since then the Oxford office will build its own IR that can't talk to ours since ours will be useless to them.

There is an awesome opportunity here: Quantinuum is the only place I know of that has an entire house filled with people that draw (hierarchical!) string diagrams for all kinds of types of computation; quantum, ML, quantum photonics, etc. So far that has been somewhat isolated from what we do in Cambridge, and to some extent isolated from practicality. HUGR is so close to being exactly the tool they would need to translate a lot of existing and novel research into actually running programs.

To me that's THE value proposition of HUGR and I struggle to understand its purpose without that.

zrho commented 1 month ago

Here’s a proposal that might bring some peace of mind: we could start with the requirement that all regions to custom operations are isolated wrt port connectivity. After declarative extensions, we can try to figure out a way to express ways in which that requirement can be loosened in a manageable way, with relatively low priority.

ss2165 commented 1 month ago

Not really following the thread, just pointing to this other discussion as my currently preferred way of doing this in the core: https://github.com/CQCL/hugr/discussions/1342

acl-cqc commented 1 month ago

Not really following the thread, just pointing to this other discussion as my currently preferred way of doing this in the core: #1342

Yes, if we do that (now an issue #1594), can we close this??

ss2165 commented 3 weeks ago

Don't think so - I think this discussion is still unresolved as to whether refactoring that to work via one single graph is a good thing to do (but a lower priority discussion)

CQCL / hugr

feat: Allow extension operations to have child graphs #1546