Closed ouyangzhongqing closed 5 years ago
Hi,
Yes MorphNet can be applied to part of the network.
My guess will be more accurate if you share the code but I'd like to note that you should provide for both input_boundary
and output_boundary
a list of ops. So make sure you pass tensor.op
for the input and output tensors.
Note that, if the parts of the networks you want to regularizer are exactly the parts that have batch norm, it you could probably skip the input_boundary
argument, and also overshot the output_boundary
to be logits.op or something similar. The effect should be exactly the same.
If on the other hand you will be happy to learn the structure of the entire network, you could apply a GroupLasso based regularizer to the non BN part and a Gamma based regularizer to the BN part of the network.
Hope it helps,
Elad
Neural network construction as shown above, and regularizer loss are added. in function _get_cost_or_regularization_term(), "total" can display by tf.Print(), but "input_tensor" can't diaplay. The morphnet has errors as follows!
Hi,
@ouyangzhongqing I had similar error (InvalidArgumentError: Retval[0] does not have value) when using tf.contrib.layers.conv2d with BN layer, and GammaFlopsRegularizer. The issue is gone when I use conv2d with slim API: tf.contrib.slim.layers.conv2d.
@eladeban I have two questions regarding your comment, the last paragraph. If we have a model with 100 layers, half of the layers with conv2d-BN and half the layers only conv2d, and assuming they are randomly placed.
1- Do we have to create 100 network_regularizers with inputs and output ops defined explicitly?
2- I had another experiment where GroupLassoFlopsRegularizer could regularize all layers (with or without BN) if tf.contrib.layers.conv2d is used (instead of slim). But I haven't seen this method recommended. Does it make the results invalid?
Thanks
@ouyangzhongqing I don't quite follow the question. which line is causing this error?
_get_cost_or_regularization_term
does not have a tf.Print() are you trying to print something?convolution
?coderwwy80@: re: slim/contrib.layers They should be exactly the same due to redirect. I am not sure how its possible. Can you compare the graph.pbtxt to see what is created differently?
re 100 layer question: I have never seen anything similar, but Gamma will correctly regularizer all BN convs. You can use blacklist to exclude all of this convs and pass it to GroupLasso. I feel the first focus should be to make the structure learning to run on a simple vanilla network and then move forward to the experimental or esoteric...
Do you have have a 100 layer network half with and half without BN?
Hi @eladeban
re " Gamma will correctly regularizer all BN convs. You can use blacklist to exclude all of this convs and pass it to GroupLasso."
I have a structure that looks like this (assume all convs have bn layer fused):
|------------- conv1 ---------- |____ output |---conv2---conv3 --fc1--fc2--|
If I run Gamma*, only conv2 can be regularized. Since fc1&fc2 are not conv-bn layers they can't get regularized, and therefore the number of channels shouldn't change at the output, which results in no regularization for conv1 either. conv3 cannot be regularized because fc input channels depend on it and needs to stay fixed. Therefore only conv1 is regularized.
Looks like conv and fc layers have to be regularized at the same time to gain huge optimization in number of flops (ideal case). Is it possible to do so?
Thanks
Hi now I get it :)
If your setting is close enough to https://arxiv.org/abs/1709.01507, I would guess that you will be safe to ignore the cost of conv2,conv3,fc1, fc2 as it should be negligible compared to the cost of other elements. In the order of HxW less expensive.
Your solution with the blacklist should work.
I am just curios, its great you are trying to get 110% our of MorphNet, but I will be happy to know if you already tried it just with Gamma*? Did you see any interesting results?
Hi, @eladeban
I could get about 50% flop reduction with some accuracy loss (20% increase in error percentage). I think the lottery ticket hypothesis can be applied here to achieve better accuracies, but I haven't tried it here. Any suggestions in this regard? Thanks
I just use the first three bn layers(conv0-bn0-leakyrelu0, conv1-bn1-leakyrelu1, conv2-bn2-leakyrelu2 ) of network that total 74 layers, leakyrelu2 as output.op,input_data as input.op. when runing have a error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Retval[0] does not have value
the morphnet can act on some layers of the model?