Feedback plan - Githubissues

qbilius commented 8 years ago

Dan is working on implementing a generic unroller of an RNN. Currently, we evaluate each layer (ConvRNN cell) over all time points and then move to the next layer. This, as Harry already noticed, will not work for feedback. It is not general enough. Instead, we need to evaluate each layer at each time and send outputs to wherever they ought to be. Dan ~~is hoping to get that ready in a week and~~ will post an outline ~~in the meantime~~ so that Harry can ~~see what will be done and not worry about that~~ can start implementing it (with feedback from the others).

~~2. Harry should familiarize himself with the existing code base and do what he can, keeping in mind the changes that Dan's unroller will bring.~~

~~3. Once Dan has something working, we'll all chat to see how we like it. Harry should have his opinions ready how he'd like it for feedback and what is needed.~~

Once the unroller is ready, Harry will finalize an implementation of some arbitrary feedback model (e.g., 5->1) and work on training it.

yamins81 commented 8 years ago

OK, so I'll start with a short description of the current situation with the code, and then suggest a plan for we need to do next.

The piece of code that we're mostly thinking about right now is the get_model function in the file bypass/models.py. Currently this function takes a base model descriptor (a convnet configuration spec) and a list of bypass routings desired, and then outputs a tensorflow graph that represents a time-unrolled version of the RNN for the network with the bypass structure added.

We want to generalize this to handle feedback. Our design goal will be to allow the user to specify a "base graph" describing the data flow interactions (e.g. which layers feed into which) and then produce a derived tensorflow graph representing the unrolled network consistent with that structure. The base graph input should be a networkx object with the relevant node attribute info describing the operations of the graph (e.g. conv nodes, pool node) and the links describing the data flow (including the feedforward, feedback, and bypass links). The output will of course be a tensorflow graph object that can be evaluated on input data.

To achieve this goal, there are two problems in the code right now that we need to overcome:

(1) Part of what really should be in the construction of the "base graph" is baked into the middle of the get_model function where the unrolling needs to be happening. Specifically, look at lines 331-357 of models.py in the master branch. This code is basically implementing the fact that when bypass links are added, "adapters" need to be imposed to allow layers of different sizes and shapes to be connected. This code really needs to be moved to the _construct_graph function. Ideally, we should have a separate function, part of the tconvnet code, that takes a base convnet specification, together with a set of desired bypasses/feedbacks, and outputs a networkx graph with the proper "connectors". That is, roughly something like the following pseudocode

def _complete_graph(base_graph, links):
    """
    inputs:
        base_graph = either a JSON specification of a convnet or a networkx object 
                     specifying the convnet
        links = list of pairs of layers to be connected, possibly with a specification 
                of what type of connection should be made (e.g. (un)pooling + concatenation, 
                etc)    
    outputs:
        networkx object whose nodes correspond to convnet layers and relevant connecting 
        operations and whose edges contain the original convnet information flow as
        well as the ones described by the "links" input

        Question:  should we put the rnn cell creation code here, so that each of the nodes
        of the output graph has a "cell" attribute, containing the rnn cell?  Or should
        it just contain the information so that the graph_rnn funciton below can create
        the actual rnn cell operation?  Don't know.
    """

    #do creation of poolings/unpoolings/concatenation/etc  as currently exemplified
    #in lines 331-357 of models.py.   Presumably we might want to generalize what is 
    #currently there to handle other types of connectors (we can talk about this realtime).

Then, the output of this graph will be passed to the unrolling function described below.

(2) The current model relies on the tf.nn.rnn class to perform unrolling. However, this is no longer sufficient in the case of feedback. We'll have to write our own unroller. I believe that we should seek to write this unroller not in the tconvnet repo, but instead in the tfutils repo, or perhaps even in a fork of tensorflow, with the intent of submitting a PR to tensorflow.

How will this unrolling work? There are two basic versions, the "static unroller" (corresponding to the tf.nn.rnn class in the current tensorflow code) and the "dynamic unroller" (corresponding to the tf.nn.dynamic_rnn class). I'm going to confine myself for the moment to describing what we should do the static case -- let's get to the dynamic case later.

Basically the unroller is like what is already in get_model.py, but it should be some modified as follows: --> the order of the temporal and layer loop needs to be switched. That is, the temporal loop needs to be on the outside --> the inner loop should now be a loop over graph elements of the output of _complete_graph. It doesn't really matter what order they're called in. Topological sort or whatever will not in general work since this graph will not be a DAG. --> if the trim_top and/or trim_bottom are desired, you'll want to be careful to not create nodes for the relevant layers during the trimmed time points. In the current code it's easy to do this since the inner loop over time can run from layer['first'] to layer['last']+1 but now that the temporal loop is on the outside this will be have to thought about slightly more carefully. --> The inner loop should still end up with a call to tf.nn.rnn, but now separately for each time point.

In other words, I'm suggesting the following (very rough) pseudocode

In tconvent/model.py:

def get_model(input_seq,                              #input data provider
              model_base_func,                #function creating base graph
              model_kwargs=None,          #arguments for that function
              links_func=None,              #function to decide which links to make
              links_kwargs=None,                #arguments to that function
              trim_top=True,                #trim unneeded top nodes?
              trim_bottom=True,         #trim unneeded bottom nodes?
              feature_layer=None,             #need this? maybe not
              batch_size=None):
    #create base convnet
    base_graph = model_base_func(batch_size, **model_kwargs)
    #create links
    links = links_func(base_graph, **links_kwargs)
    #create completed networkx graph
    full_graph = _complete_graph(base_graph, links)
    #determine first/last times/ for trimming as a function of trip_top and trim_bottom
    .... as in existing code ...
    #create unrolled tensorflow graph
    graph_rnn(input_sequence, full_graph)
    #QUESTION: do we want to still d something like the "feature_layer" argument of the existing 
    #get_model function? dont' know -- let's figure this out as the code gets written
    return full_graph

And then, in tfutils somewhere or a fork of tensorflow:

def graph_rnn(input_sequence, nx_graph):
    """
    inputs: 
        input_sequence -- input data sequence
        nx_graph -- a networkx graph with nodes labeled with "cell" object    
    outputs: None
    side-effects:
        creates additional "outputs", "inputs", and "states" 
        attributes on each node of nx_graph.  these correspond to tensorflow nodes
        as in the existing get_model code.
    Question:  should this function operate via side effects on the networkx graph, as 
    in the current get_model function?  Or should graph_rnn *return* a list (or dict) of 
    tensorflow nodes?  I tend to like the current setup, but let's discuss. 
    """
    #calculate ntimes from length of input_sequence and nx_graph structure
    ntimes = ... 
   #loop over time 
    for t in range(ntimes):
           #loop over nodes
        for node in nx_graph.nodes():
            if not (node['first'] <= t <= node['last']):
                continue
            #create initial states for all nodes in nx_graph
            if t == node['first']:
                node['state'] = node['cell'].zero_state(None, None)
            #gather inputs
            parents = graph.predecessors(node)
            inputs = [p['outputs'][t - 1] for p in parents]
            #compute output and state
            out, state = tf.nn.rnn(cell=node['cell'],
                            inputs=inputs,
                            initial_state=node['state']) 
            node['outputs'].append(out)
            node['state'] = state

(3) As soon as we can get the above working, we'll want to create a "dynamic_graph_rnn" version of the above in analogy to tf.nn.dynamic_rnn

yamins81 commented 8 years ago

The above is rough pseudocode, obviously. If you're interested, I also have a essentially functional but far from really tested prototype of item 2, but this is not hard code...

snickerdudle commented 7 years ago

Things to do by next iteration of Unicycle:

1) ~Merge resize_shape_to function into Harbor class as a method, do all the input shape calculations inside of Harbor~ ...done!

2) ~Fix Unrolling time dependence on previous state (t-1 fix)~ ...done!

3) ~Use tf-utils for training~ ...done!

4) ~Support for many Placeholders and Nodes with no predecessors - run down the graph from all of these nodes and merge together.~ ...done!

5) ~Class-ify Unicycle, add __call__ method~ ...done!

6) ~Longest path Harbor Policy as default policy~ ...done!

7) ~Think about Harbor Master Policy as a function that takes a list of paths and/or an arbitrary set of inputs depending on what the function is.~ ...done!

8) Figure out proper FC to CONV conversion algorithms

9) Fix BIAS node class

10) ~Make everything purdy~ ...done!

11) ~Add unrolling loop through time, memoize outputs~ ...done!

12) ~Modularize! Unicycle has a modular output of a list of callables to be passed to unroller. 5 separate method calls!~ ...done!

13) Modify GenFuncCell to have methods to memoize outputs and states. Look into pruning.

14) Look into Harbor generalization, merge Harbor_Dummy into Harbor

15) flake8 things

16) Pass around NetworkX instead of lists

17) Add tests!

18) Do training ooooh yeah - timing test and regression test. Benchmarks

19) Tests for small scale architectures like MNIST

20) Train on AlexNet and compare

21) Train on DenseNet and compare

22) Train on VGG and compare

dicarlolab / tnn

Feedback plan #28