Modular model definitions: network layer, layer modules, and protobuf generation

jyegerlehner commented 10 years ago

Does anyone see any value in a new layer type that is itself a network? This would allow a recursive inclusion of networks within networks.

The inception module used in GoogLeNet is an example of a network-in-network. I provided an implementation of GoogLeNet's Inception "module" in this. With Caffe as it is, if you want to build GoogLeNet, you would have to copy all those Inception module layers 9 times thereby duplicating the same thing many times, changing the names of layers and top and bottom blobs to be unique within the net. By contrast, if we have a Network layer type, then each Inception module would appear as a single layer instance of type NETWORK (perhaps). It would refer to the network's prototxt somehow, and specify values for the the things that are distinctive about each instance of the Inception module (such as the number of output channels of each of the convolutions and the pool). And also the bottom and top blob names.

Am I on my own if I want this implemented? Anyone already working something like this? I wouldn't want to duplicate effort.

longjon commented 10 years ago

A compositional way to define networks is desirable and has been discussed, but I don't know of any concrete efforts underway right now (correct me if you know otherwise).

This could be done the way you describe, with a layer type that actually inserts a sequence of layers defined by another prototxt, or it could be done in many other ways.

Personally I'm partial to the idea that we need a better language for defining nets, which supports composition in a simple and natural way. In particular, it might not be too hard to make it possible to generate net definitions from Python code. I might work on this, but not right away.

(The language has gotten rather convoluted here. This is not recursion; it's simple composition of functions. Notwithstanding the name network-in-network, let me make it clear to those reading that we are still talking about DAGs of layers, nothing more complicated.)

jyegerlehner commented 10 years ago

Thanks for the reply @longjon.

Just one point of confusion I have in case it implies some profound error on my part. Regarding this:

This is not recursion; it's simple composition of functions. Notwithstanding the name network-in-network, let me make it clear to those reading that we are still talking about DAGs of layers, nothing more complicated.

Maybe I'm missing some subtle distinction here. If there were a layer type for a network, and that network can also include layers of type network, then it seems to me one needs to recursively parse included networks and their layers of type NETWORK in order to instantiate the DAG that one ultimately ends up with. So I don't see how the design approach isn't both recursive and still ultimately producing a DAG of non-network layer instances. Don't see how they are incompatible as your comment suggests. But I'm from an engineering background not CS so maybe I'm missing your point.

Regarding generation of net definitions from Python code: I'm not warm to that idea, not being a Python person. I rather like the protobuf-based definitions. Guess I'd have to see how it works. Not that my opinion should count much on the matter.

longjon commented 10 years ago

@jyegerlehner, I think your understanding is correct. To be clear: the network is a function written in the prototxt language, and that function is not recursive. (In particular, we are not talking about recurrent nets here.) Caffe parses that definition using some code written in C++, which very well might be recursive.

(To me, "recursive inclusion" suggests including a network within itself, which is not what we are talking about here. And the neologism "network-in-network" suggests a broader space of possible nets, but this is purely about notation.)

Rest assured, the protobuf definitions are not going anywhere. But they are just data, and we may gain some new ways to manipulate that data.

kmatzen commented 10 years ago

If you're interested in generating protobufs for networks such as these in python, I have an example you can use. https://github.com/kmatzen/caffe/blob/inception_plan/python/inception_plan.py

jyegerlehner commented 10 years ago

@longjon Thanks for the followup reply, and it explaining it to me slowly so I can understand. I see my use of "recursive" was confusing/wrong.

@kmatzen Nice python code! Thanks for sharing that. I guess that might be the kind of thing longjon was referring to above when he said generate net definitions from python code.

Looking at your example makes me wonder if perhaps we just want to keep the prototxt for a net flat like it is now, and not add in a "network in network" kind of module composability. Whenever we need to include a module such as the inception module in our net, then we just write a script that generates the flattened prototxt. On the other hand we might want to see the original structure showing the modules. If we generate prototxt as in your example, the modules have all been flattened, and the composition of the modules is no longer visible. I'm writing a utility that generates a graphviz dot file from prototxt, and it would be good to have the option of seeing the modules in the graph instead of only just the flattened net.

jeffdonahue commented 10 years ago

@kmatzen Nice python code! Thanks for sharing that.

Agreed! Really well-written Python module for building GoogLeNet and a great example of using the Pycaffe proto library to generate nets (which I think is underutilized due to lack of examples, including by myself). Well done and thanks for sharing @kmatzen.

I haven't actually tried running it, but assuming it really does generate the GoogLeNet architecture as described in the paper, I think it would be a very useful example to have in Caffe, if you'd be willing to contribute it.

Looking at your example makes me wonder if perhaps we just want to keep the prototxt for a net flat like it is now, and not add in a "network in network" kind of module composability. Whenever we need to include a module such as the inception module in our net, then we just write a script that generates the flattened prototxt.

I think this is a pretty reasonable option. I've actually never used caffe_pb2 in Python, I've just written my own hacky Python scripts which themselves generate the model prototxt as text, and both options seem to work well enough. But I don't doubt there are better ways of doing these things as @longjon describes.

It could be possible to add something to the NetParameter to let you define named "modules", which are each essentially a mini-net (e.g. an "inception" module from googlenet) with inputs and a series of layers and some outputs. Then instead of a type: one could specify a module: (as a string, referring to one of the defined module names).

shelhamer commented 10 years ago

Well done and thanks for sharing @kmatzen.

Yeah, thanks @kmatzen!

It could be possible to add something to the NetParameter to let you define named "modules", which are each essentially a mini-net (e.g. an "inception" module from googlenet) with inputs and a series of layers and some outputs. Then instead of a type: one could specify a module: (as a string, referring to one of the defined module names).

I like this line of thought, but haven't worked it out fully as far as what Net::Init should do and how it should be defined in proto. For instance, multi-scale models have weight shared modules that are identical models with different bottoms and tops within the module. Defining a module instead of type that is itself a collection of layers but keeping the bottoms and tops of the layer definition could override the inputs and outputs of the module. Defining a ModuleParameter to hold mini-nets like @jeffdonahue suggested should work in this scheme. It would be nice to reconcile weight sharing with this too, so that one could have distinct or shared modules.

Barring that, keeping a flat net definition certainly works, and the better one scripts the more expressive the models can be -- nevertheless it would be nice to include modularity into the model schema itself.

longjon commented 10 years ago

@kmatzen, indeed, this is how you get net protobufs from Python.

In fact I have in mind something a bit more generic and succinct... essentially a little library to let you write down a function in more-or-less natural Python code while actually constructing a net protobuf.

@jeffdonahue @shelhamer, indeed, specifying composition within the net protobuf might be useful as well...

@jyegerlehner, indeed, one may want to see the net at different levels of abstraction (just as, when using a generic compiler, sometimes one wants to see source code, or an AST, or assembly, or some intermediate form.) In any case our visualization tool python/draw_net.py could certainly use some improvements...

jyegerlehner commented 10 years ago

@longjon

our visualization tool python/draw_net.py could certainly use some improvements...

I had no idea draw_net.py existed. I was doing something redundant. Sigh. What does it need by way of improvements?

jyegerlehner commented 10 years ago

@shelhamer,

Defining a module instead of type that is itself a collection of layers but keeping the bottoms and tops of the layer definition could override the inputs and outputs of the module.

Just to make sure I'm following: you're suggesting that rather than have a layer type called MODULE, one would either specify a type for the layer, or a module, but not both.

I like this line of thought, but haven't worked it out fully as far as what Net::Init should do and how it >should be defined in proto.

I have a few embryonic ideas in that direction, from just perusing the Net::Init code. Perhaps the module layers (as opposed to layers with a type) could be expanded in the Net::FilterNet method, which is the first thing Net::Init() does. All the work of in-lining the modules could be in Net::FilterNet. So by the time we get to the rest of Net::Init, the modules have already been expanded, or in-lined, and all that's seen is a flat DAG without any module layers left. So (knock on wood) no other code in Net::Init would have to change, perhaps?

Another thing I think would need to happen in Net::FilterNet is that as one instantiates the module layers in a net, is to qualify the names of the layers that get produced with the name of the parent module layer. So for example say we had a net that has an Inception module, and a couple layers that instantiate the module:

name: "GoogLeNet"

modules {
  name: "Inception"
  ... other layers...
  layers {
    name: "1x1_relu"
    type: RELU
    bottom: "1x1_conv"
    top: "1x1_relu"
  }
  ... other layers...
}

... other layers

layers {
  name: "inception3a"
  module: "Inception"
  ... stuff omitted, such as naming of top and bottom blobs
}

layers {
  name: "inception3b"
  module: "Inception"
  ... stuff omitted, such as naming of top and bottom blobs
}

Then after Net::FilterNet finishes inlining the module layers, 1x1_relu layer of the module would be qualified with its parent layer's name resulting in two corresponding instances, one named inception3a::1x1_relu and another inception3b::1x1_relu. Same thing for the blobs. So after Net::FilterNet is done, the inlined prototxt would contain:

name: "GoogLeNet"

... other layers

layers {
  name: "inception3a::1x1_relu"
  type: RELU
  bottom: "inception3a::1x1_conv"
  top: "inception3a::1x1_relu"
}

...other layers

layers {
  name: "inception3b::1x1_relu"
  type: RELU
  bottom: "inception3b::1x1_conv"
  top: "inception3b::1x1_relu"
}

Inception modules are also parameterized by numbers of filters of various layers. For example the 3x3 convolution layer's num_output is 128 in inception 3a, and different values in other instances of the inception module. There would need to be some mechanism for the module to declare what parameters need to be specified when it is included in a net, and check that those are all supplied at the instantiating site.

It would be nice to reconcile weight sharing with this too, so that one could have distinct or shared modules.

Regarding this, perhaps a module could specify a .caffemodel from which to copy its weights, if what is desired is shared weights everytime the module is instantiated. Or in the case like GoogLeNet where each inception module has its own weight shapes and values, that would all be handled like normal since only the inlined, or flattened, net would ever be serialized/deserialized, and all the layers have unique names by that time.

So does this sound like a plausible approach? Fatal flaws? Better ideas?

shelhamer commented 10 years ago

For weight sharing all the module needs is to set the right param fields, just as its bottoms and tops need to be hooked into the flat definition. The issue is how to define in the proto which param names to assign to which module layers. A preliminary solution could be a flag to either (1) do the module + layer name mangling as @jyegerlehner suggested for layer names and connections for no sharing, or (2) preserve the param fields defined in the module for weight sharing among all instantiations of a given module.

jeffdonahue commented 10 years ago

@jyegerlehner yup, that's pretty much what I had in mind. I'd suggest a minor design change though: rather than adding it to FilterNet, create a separate method to do the module inlining only, and then another method that calls both FilterNet and the new module inlining method (maybe called FlattenNet).

shelhamer commented 10 years ago

create a separate method to do the module inlining only, and then another method that calls both FilterNet and the new module inlining method (maybe called FlattenNet).

Agreed. The module inlining should follow the filtering since modules could follow all the same rules for layers.

jyegerlehner commented 10 years ago

Thanks for the comments all.

Starting to develop this and hope it will culminate in a PR as soon as I have something working. If you think of anything else I need to consider please pass it on.

jyegerlehner commented 10 years ago

I'll just add: it may be a while. The day job, and all that. If someone else is working on this or wants to collaborate please let me know. I can try to commit more incrementally and give permissions to my feature branch.

kloudkl commented 10 years ago

How much memory does this model require? Its huge memory demand probably will stop it from running on GPU. The distributed training #1148 is a prerequisite to use this huge net. It is doubtful whether it is worthwhile to not use the VGG model if the dataset has only a few millions of images.

kmatzen commented 10 years ago

Minibatch size of 256 -> 13.1469 GB.

Edit: Unless I defined GoogLeNet incorrectly. The dot visualization looks correct and the dimensions match up, but I haven't actually trained it to verify correctness.

shelhamer commented 10 years ago

For the training of memory-hungry models see https://github.com/BVLC/caffe/issues/1242#issuecomment-59580344.

@kloudkl not so. cuDNN or shared column buffers #1291 reduce memory usage, most importantly one can decouple the learning batch size from the computation / data batch size by accumulating gradients.

@kmatzen your calculation sounds right -- I did not check it -- but accumulating gradients can drive this down as you like. Note too that you do not necessarily want a batch size of 256 for GoogLeNet.

cypof commented 10 years ago

I just got #1290 to pass, it lets you import a net as a layer. It was used initially for siamese nets, but the mechanism is generic.

jyegerlehner commented 10 years ago

@cypof Thanks that's interesting. It looks similar to what I was doing. I like the simplicity of the variable configuration via string replace; I had been going down the route of using protocol buffer's reflection API.

Looks like your implementation forces a module to be in another file. My preference is to modify the Net proto to allow it to contain Modules as well as Layers (Modules containing Layers). Then a layer (of type IMPORT in your scheme) would merely refer to the module by name. So it can all be in one protobuf file, and not force modules into separate files. But that would be a straightforward extension to what you've already got that I or we can add later.

Would you be averse to changing IMPORT enum to MODULE? "Import" seems to suggest that a separate file is being imported, whereas it would be good to leave the door open to Modules defined in the same Net protobuf file that they are instantiated in.

I'll suspend working on my feature branch in anticipation of your PR getting merged, at least until we get some indication from the maintainers. In the meantime I'll grab your feature branch and give it a whirl.

futurely commented 10 years ago

It's astonishingly simple to define the GoogLeNet inception model module by module in Torch7. There must be something wrong with it.

shelhamer commented 10 years ago

@futurely it seems they just defined an inception template -- nothing wrong with that. In @kmatzen has linked his Python + Caffe inception generator earlier in this thread https://github.com/BVLC/caffe/issues/1169#issuecomment-57834743:

https://github.com/kmatzen/caffe/blob/inception_plan/python/inception_plan.py#L477-L610

Expect more Pythonified Caffe tools shortly for defining and experimenting with networks.

futurely commented 10 years ago

I used the generator to get a 2000+ lines of prototxt. That's why I was shocked by the extreme conciseness of Torch7.

kmatzen commented 10 years ago

BTW, that torch implementation is missing the auxiliary classifiers, although it would still be more concise after they are added.

I typically hate defaults, but needing to specify all of your connections in caffe does get a bit old after awhile.

futurely commented 10 years ago

Developer time is much more expensive than machine time.

sguada commented 9 years ago

Take a look to #1518

BVLC / caffe

Modular model definitions: network layer, layer modules, and protobuf generation #1169