Open vograno opened 2 months ago
@vograno haven't tried. The internal FunctionGraph
does have the bi-directional linking, so the building blocks are there. To me it sounds like you'd want to change a bit of how the graph is walked and what state is stored where for this, e.g. a new Driver.
Adding to what @skrawcz said:
Heh, fascinating idea. I've had a similar idea but never really considered it. Some thoughts (feel free to disagree on these points!)
As @skrawcz said -- it would also require rewalking the graph, at least backwards.
What I'd do if you're interested is first build a simple 2-node graph out of pytorch objects, which have the relationship. Then you can POC it out. You can compute gradients individually, and expand to more ways of specifying nodes/gradients.
I'd also consider other optimization routines if the goal is flexibility!
... The internal
FunctionGraph
does have the bi-directional linking, so the building blocks are there. To me it sounds like you'd want to change a bit of how the graph is walked and what state is stored where for this, e.g. a new Driver.
I can walk the graph backward all right, but I also need to create new nodes in the back-flow graph along the way, and this is where I'm not sure. I can see two options, but first note:
backprop
function, not at the module level where we typically write node functionsOption 1 - using temp module.
Option 2. Start the back-flow graph empty and add nodes to it as I traverse the forward graph. Here I need to create nodes outside the Builder and I'm not sure what API to use.
Let me propose a less fascinating, but simpler example to work with.
The goal is to compute the back-flow driver given the forward-flow one.
@vograno sounds cool. Unfortunately we're a little bandwidth constrained at the current time to be as helpful as we'd like. So just wanted to mention that. Do continue to add context / questions here - we'll try to help when we can.
A dag defines a forward flow of information. At the same time it implies a back-flow of information when we reverse each edge on the DAG and associate a back-flow node function with each node. Additionally, we should also provide a merging function that merges back-flow inputs that connect to the common forward output, but this is a technicality.
The gradient descent of a feed-forward neural network is an example of such back-flow. Namely, the forward pass computes node outputs and gradients given the model parameters, while the back pass updates model parameters according to the computed gradients. I think, the merging function is
sum
in this case.The question is then whether Hamilton is an appropriate framework for inferring the back flow DAG from the forward one. Here inferring means compute the back-flow driver given the forward-flow one.
Use gradient descent as a study case.