Open qbilius opened 8 years ago
OK, so I'll start with a short description of the current situation with the code, and then suggest a plan for we need to do next.
The piece of code that we're mostly thinking about right now is the get_model function in the file bypass/models.py. Currently this function takes a base model descriptor (a convnet configuration spec) and a list of bypass routings desired, and then outputs a tensorflow graph that represents a time-unrolled version of the RNN for the network with the bypass structure added.
We want to generalize this to handle feedback. Our design goal will be to allow the user to specify a "base graph" describing the data flow interactions (e.g. which layers feed into which) and then produce a derived tensorflow graph representing the unrolled network consistent with that structure. The base graph input should be a networkx object with the relevant node attribute info describing the operations of the graph (e.g. conv nodes, pool node) and the links describing the data flow (including the feedforward, feedback, and bypass links). The output will of course be a tensorflow graph object that can be evaluated on input data.
To achieve this goal, there are two problems in the code right now that we need to overcome:
(1) Part of what really should be in the construction of the "base graph" is baked into the middle of the get_model function where the unrolling needs to be happening. Specifically, look at lines 331-357 of models.py in the master branch. This code is basically implementing the fact that when bypass links are added, "adapters" need to be imposed to allow layers of different sizes and shapes to be connected. This code really needs to be moved to the _construct_graph function. Ideally, we should have a separate function, part of the tconvnet code, that takes a base convnet specification, together with a set of desired bypasses/feedbacks, and outputs a networkx graph with the proper "connectors". That is, roughly something like the following pseudocode
def _complete_graph(base_graph, links):
"""
inputs:
base_graph = either a JSON specification of a convnet or a networkx object
specifying the convnet
links = list of pairs of layers to be connected, possibly with a specification
of what type of connection should be made (e.g. (un)pooling + concatenation,
etc)
outputs:
networkx object whose nodes correspond to convnet layers and relevant connecting
operations and whose edges contain the original convnet information flow as
well as the ones described by the "links" input
Question: should we put the rnn cell creation code here, so that each of the nodes
of the output graph has a "cell" attribute, containing the rnn cell? Or should
it just contain the information so that the graph_rnn funciton below can create
the actual rnn cell operation? Don't know.
"""
#do creation of poolings/unpoolings/concatenation/etc as currently exemplified
#in lines 331-357 of models.py. Presumably we might want to generalize what is
#currently there to handle other types of connectors (we can talk about this realtime).
Then, the output of this graph will be passed to the unrolling function described below.
(2) The current model relies on the tf.nn.rnn class to perform unrolling. However, this is no longer sufficient in the case of feedback. We'll have to write our own unroller. I believe that we should seek to write this unroller not in the tconvnet repo, but instead in the tfutils repo, or perhaps even in a fork of tensorflow, with the intent of submitting a PR to tensorflow.
How will this unrolling work? There are two basic versions, the "static unroller" (corresponding to the tf.nn.rnn class in the current tensorflow code) and the "dynamic unroller" (corresponding to the tf.nn.dynamic_rnn class). I'm going to confine myself for the moment to describing what we should do the static case -- let's get to the dynamic case later.
Basically the unroller is like what is already in get_model.py, but it should be some modified as follows: --> the order of the temporal and layer loop needs to be switched. That is, the temporal loop needs to be on the outside --> the inner loop should now be a loop over graph elements of the output of _complete_graph. It doesn't really matter what order they're called in. Topological sort or whatever will not in general work since this graph will not be a DAG. --> if the trim_top and/or trim_bottom are desired, you'll want to be careful to not create nodes for the relevant layers during the trimmed time points. In the current code it's easy to do this since the inner loop over time can run from layer['first'] to layer['last']+1 but now that the temporal loop is on the outside this will be have to thought about slightly more carefully. --> The inner loop should still end up with a call to tf.nn.rnn, but now separately for each time point.
In other words, I'm suggesting the following (very rough) pseudocode
In tconvent/model.py:
def get_model(input_seq, #input data provider
model_base_func, #function creating base graph
model_kwargs=None, #arguments for that function
links_func=None, #function to decide which links to make
links_kwargs=None, #arguments to that function
trim_top=True, #trim unneeded top nodes?
trim_bottom=True, #trim unneeded bottom nodes?
feature_layer=None, #need this? maybe not
batch_size=None):
#create base convnet
base_graph = model_base_func(batch_size, **model_kwargs)
#create links
links = links_func(base_graph, **links_kwargs)
#create completed networkx graph
full_graph = _complete_graph(base_graph, links)
#determine first/last times/ for trimming as a function of trip_top and trim_bottom
.... as in existing code ...
#create unrolled tensorflow graph
graph_rnn(input_sequence, full_graph)
#QUESTION: do we want to still d something like the "feature_layer" argument of the existing
#get_model function? dont' know -- let's figure this out as the code gets written
return full_graph
And then, in tfutils somewhere or a fork of tensorflow:
def graph_rnn(input_sequence, nx_graph):
"""
inputs:
input_sequence -- input data sequence
nx_graph -- a networkx graph with nodes labeled with "cell" object
outputs: None
side-effects:
creates additional "outputs", "inputs", and "states"
attributes on each node of nx_graph. these correspond to tensorflow nodes
as in the existing get_model code.
Question: should this function operate via side effects on the networkx graph, as
in the current get_model function? Or should graph_rnn *return* a list (or dict) of
tensorflow nodes? I tend to like the current setup, but let's discuss.
"""
#calculate ntimes from length of input_sequence and nx_graph structure
ntimes = ...
#loop over time
for t in range(ntimes):
#loop over nodes
for node in nx_graph.nodes():
if not (node['first'] <= t <= node['last']):
continue
#create initial states for all nodes in nx_graph
if t == node['first']:
node['state'] = node['cell'].zero_state(None, None)
#gather inputs
parents = graph.predecessors(node)
inputs = [p['outputs'][t - 1] for p in parents]
#compute output and state
out, state = tf.nn.rnn(cell=node['cell'],
inputs=inputs,
initial_state=node['state'])
node['outputs'].append(out)
node['state'] = state
(3) As soon as we can get the above working, we'll want to create a "dynamic_graph_rnn" version of the above in analogy to tf.nn.dynamic_rnn
The above is rough pseudocode, obviously. If you're interested, I also have a essentially functional but far from really tested prototype of item 2, but this is not hard code...
Things to do by next iteration of Unicycle:
1) ~Merge resize_shape_to
function into Harbor
class as a method, do all the input shape calculations inside of Harbor
~ ...done!
2) ~Fix Unrolling time dependence on previous state (t-1 fix)~ ...done!
3) ~Use tf-utils
for training~ ...done!
4) ~Support for many Placeholders and Nodes with no predecessors - run down the graph from all of these nodes and merge together.~ ...done!
5) ~Class-ify Unicycle
, add __call__
method~ ...done!
6) ~Longest path Harbor Policy as default policy~ ...done!
7) ~Think about Harbor Master Policy as a function that takes a list of paths and/or an arbitrary set of inputs depending on what the function is.~ ...done!
8) Figure out proper FC to CONV conversion algorithms
9) Fix BIAS node class
10) ~Make everything purdy~ ...done!
11) ~Add unrolling loop through time, memoize outputs~ ...done!
12) ~Modularize! Unicycle has a modular output of a list of callables to be passed to unroller. 5 separate method calls!~ ...done!
13) Modify GenFuncCell to have methods to memoize outputs and states. Look into pruning.
14) Look into Harbor generalization, merge Harbor_Dummy into Harbor
15) flake8 things
16) Pass around NetworkX instead of lists
17) Add tests!
18) Do training ooooh yeah - timing test and regression test. Benchmarks
19) Tests for small scale architectures like MNIST
20) Train on AlexNet and compare
21) Train on DenseNet and compare
22) Train on VGG and compare
is hoping to get that ready in a week andwill post an outlinein the meantimeso that Harry cansee what will be done and not worry about thatcan start implementing it (with feedback from the others).2. Harry should familiarize himself with the existing code base and do what he can, keeping in mind the changes that Dan's unroller will bring.3. Once Dan has something working, we'll all chat to see how we like it. Harry should have his opinions ready how he'd like it for feedback and what is needed.