Closed pfeatherstone closed 3 years ago
Actually, having the tags there is enough to get me going and define the rest of the network. BUT, yolov3 has three yolo layers. So unless i apply grid offsets, anchors, permute dimensions, and concatenate everything at the end, the network will have to output three tensors anyway at the end.
I have never tried it but, if you don't want to do all this reshaping and concatenation madness, and you know the tags of the layers you're interested in, I guess you can always access them directly from the the loss layer by doing something like this:
template <
typename const_label_iterator,
typename SUBNET
>
double compute_loss_value_and_gradient (
const tensor& input_tensor,
const_label_iterator truth,
SUBNET& sub
) const
{
const tensor& out1 = layer<tag1>(sub).get_output();
const tensor& out2 = layer<tag2>(sub).get_output();
const tensor& out3 = layer<tag3>(sub).get_output();
// ...
}
And then apply the yolo layer to each output.
Do i need a loss layer if i'm only interested in inference? My goal is to port the weights from darknet to a dlib-defined yolov3 network. If not, can i just tag the output layers I want, then forward the input at the front of the network, then get the outputs i want using layer<tagx>(sub).get_output()
?
Right. You would write your loss function so it goes and grabs the tags you are interested in.
But if you don’t want to train then yeah. Just access the later you want and look at its outputs.
Just noticed that repeat
only takes template<typename> class
as the repeated layer. So it's not letting me use it with resblock
as it has template <int inc, typename SUBNET>
as template signature. Have i missed something?
All the examples that use repeat
have the template<typename> class
signature
Yes, the repeat
layer only takes a template <typename SUBNET>
class. You can have a look at my definition of the Darknet53 bacbkbone here, where I predefine some things to be able to use them with the repeat
layer.
So i have this so far:
using namespace dlib;
template <template <typename> class BN>
struct yolo
{
template <int outc, int kern, int stride, typename SUBNET>
using conv_block = leaky_relu<BN<con<outc,kern,kern,stride,stride,SUBNET>>>;
template <int outc, typename SUBNET>
using resblock = add_prev1<conv_block<outc,3,1,conv_block<outc/2,1,1,tag1<SUBNET>>>>;
template <typename SUBNET> using res1024 = resblock<1024,SUBNET>;
template <typename SUBNET> using res512 = resblock<512,SUBNET>;
template <typename SUBNET> using res256 = resblock<256,SUBNET>;
template <typename SUBNET> using res128 = resblock<128,SUBNET>;
template <typename SUBNET> using block5 = repeat<4,res1024, conv_block<1024,3,2,SUBNET>>;
template <typename SUBNET> using block4 = repeat<8,res512, conv_block<512,3,2,SUBNET>>;
template <typename SUBNET> using block3 = repeat<8,res256, conv_block<256,3,2,SUBNET>>;
template <typename SUBNET> using block2 = repeat<2,res128, conv_block<128,3,2,SUBNET>>;
template <typename SUBNET> using block1 = resblock<64,conv_block<64,3,2,SUBNET>>;
using darknet53 = tag1<block5<
tag2<block4<
tag3<block3<
block2<
block1<
conv_block<32,3,1,
input_rgb_image
>>>>>>>>>;
template<int outc, int nclasses, int tag, int yolo_tag, typename SUBNET>
using detection_block = add_tag_layer<yolo_tag, con<3*(nclasses + 5), 1, 1, 1, 1, //conv7 - yolo output
conv_block<outc, 3, 1, //conv6
add_tag_layer<tag, conv_block<outc/2, 1, 1, //conv5 - branch output
conv_block<outc, 3, 1, //conv4
conv_block<outc/2, 1, 1, //conv3
conv_block<outc, 3, 1, //conv2
conv_block<outc/2, 1, 1, //conv1
SUBNET
>>>>>>>>>;
template<int nclasses>
using yolov3 =
detection_block<256,nclasses,8,12, //8 is the branch tag (don't care here), 12 is a yolo tag
concat2<skip7, skip3, //concat last layer with tag3 from darknet backbone
tag7<upsample<2,
conv_block<128, 1, 1,
skip6<
detection_block<512,nclasses,6,11, //6 is the branch tag, 11 is a yolo tag
concat2<skip5, skip2, //concat last layer with tag2 from darknet backbone
tag5<upsample<2,
conv_block<256, 1, 1,
skip4< //pick branch with tag 4
detection_block<1024,nclasses,4,10, //4 is the branch tag, 10 is a yolo_tag
skip1<
darknet53
>>>>>>>>>>>>>>;
};
This compiles. That's progress. The API is hurting my brain a bit though.
@arrufat @davisking Is there a way to turn bias off in conv_block
. Since conv_block
has a batchnormalisation layer, which already has a bias term, we don't want double biases.
@arrufat @davisking Is there a way to turn bias off in
conv_block
. Sinceconv_block
has a batchnormalisation layer, which already has a bias term, we don't want double biases.
Yes! That feature was added not that long ago in #2156, you just do:
set_all_bn_inputs_no_bias(net);
And it will do it automatically for the whole network.
Will it do the same to affine
layers ?
I'm not training, simply porting weights from darknet. So i don't need to use bn_con
layers.
concat_
layers need tags as inputs. It compiles for me with this change:
concat2<tag7, tag3, SUBNET
Cheers thank you. Getting closer to working.
Will it do the same to
affine
layers ?
No, that visitor only works with bn_
layers that have either con_
or fc_
layers as inputs.
Cheers thank you. Getting closer to working.
I am very interested in this if you manage to deserialize the darknet weights and make them work with dlib.
Also, check the paddings of the 3x3 convolutions with a stride of 2. They are 0 by default in dlib, but they need to be 1 in yolo. That is why I defined this.
Presumably, to port the weights, i will have to use a visitor?
Is there a layer for permuting dimensions? Can extract
be used? I need to go from a tensor of shape 1x255x13x13
to 1x3x85x13x13
, then to 1x13x13x3x85
.
Also to get the exact same results as darknet, we need a layer similar to upsample
that uses a "nearest" method, not bilinear interpolation.
Presumably, to port the weights, i will have to use a visitor?
Yes, at least that's how I would approach it, in particular I would use visit_layers_backwards
.
Is there a layer for permuting dimensions?
You can try with extract_
+ some extra manipulation.
I have a WIP project where I try to implement YOLOv1 (as a start) but haven't been very active lately. You can check it out: https://github.com/arrufat/yolo-dlib
EDIT: it's still WIP and it doesn't work, although the training runs...
I can do reshaping, applying grid offset and anchors post processing using pointer arithmetics and stuff. The only thing left to do is porting weights. This is all an experiment to benchmark yolov3 with dlib. Defining a loss function for yolov3 in dlib is going to be too hard and you can train in darknet or pytorch anyway.
oh and there is the disabling of biases in affine
layers, and implemented a "nearest" method for upsample
layer. so 3 things to do.
You can disable bias for affine layers easly using the new style visitor with a lambda.
Ok. Is there an example of this? Also is there a way of setting avg_red
, avg_green
and avg_blue
for input_rgb_image
layer?
Actually, just seen the code for bn_conv, looks fine.
Ok. Is there an example of this? Also is there a way of setting
avg_red
,avg_green
andavg_blue
forinput_rgb_image
layer?
You should read the documentation of the input layers.
It's possible i've missed something in the docs for the input layers. I've just used this instead:
struct input_rgb_image_zero_means : input_rgb_image
{
input_rgb_image_zero_means() : input_rgb_image(0,0,0) {}
};
You should also read dnn_introduction2_ex. You will learn that you can initialize the layers of a network by passing them when constructing the network, like this:
net_type net(input_rgb_image(0, 0, 0));
ah ok fair enough. Though using input_rgb_image_zero_means
means it's impossible to use it incorrectly.
@arrufat The API doesn't expose the layer parameters reliably. Indeed the get_layer_params
function for the affine_
layer spits back empty_params
. So i can't set the weights using get_layer_params
. It looks like i have to serialize some weights to a temporary stream then call deserialize
on that layer using that stream. What do you think?
I have the following visitor:
struct darknet_visitor
{
darknet_visitor(const char* darknet_weights)
: w(darknet_weights, std::ios::binary)
{
assert(w.is_open());
int32_t major, minor, dummy;
int64_t dummy2;
w >> major >> minor >> dummy;
if ((major * 10 + minor) >= 2 && major < 1000 && minor < 1000)
w >> dummy2;
else
w >> dummy;
cout << "weights file major " << major << " minor" << minor << endl;
}
template<typename T>
void operator()(size_t idx, T& t)
{
}
template <typename SUBNET>
void operator()(size_t idx, add_layer<affine_, SUBNET>& l)
{
cout << "affine layer " << idx << endl;
auto& bn = l.layer_details();
auto& conv = l.subnet().layer_details();
//1. bn bias
//2. bn weight
//3. bn running mean
//4. bn running var
//5. conv weight
// tensor& bn_t = bn.get_layer_params(); //THIS IS EMPTY BECAUSE affine_t spits back empty_params
stringstream ss;
ss << ...;
deserialize(bn.get_layer_params(), ss);
ss << ...;
deserialize(conv.get_layer_params(), ss);
}
template <
long outc,
long nr,
long nc,
int sy,
int sx,
int py,
int px,
typename SUBNET
>
void operator()(size_t idx, add_layer<con_<outc,nr,nc,sy,sx,py,px>,SUBNET>& l)
{
auto& conv = l.layer_details();
if (!conv.bias_is_disabled())
{
cout << "con layer " << idx << endl;
//1. conv bias
//2. conv weight
stringstream ss;
ss << ...;
deserialize(conv.get_layer_params(), ss);
}
}
std::ifstream w;
};
which i call using:
visit_layers_backwards(net, darknet_visitor("yolov3.weights"));
It looks like the get_layer_params()
for bn_
and con_
return params
. So maybe, i have to first define a model using bn_con
, then do the porting of weights, then replace all the bn_con
layers with affine
.
Hmm, getting complicated.
After declaring the network, you can forward some dummy input to initialize the params of the layers, then run the visitor.
but that still doesn't solve the problem with affine_
. Do you have to use bn_
first to port the weights? Also, since all the weights are alias_tensor
types that use params
for storage, and get_layer_params
returns params
, it's not entirely obvious how to port the weights to params
. Do you suggest using serialize
and deserialize
? Or maybe there should be new functionality added to the dnn module to make all this possible. For example have a port_weights
visitor type, designed for this use case, which is made a friend
type for all layers. Then we can have access to all the underlying tensors.
I would not use serialize/deserialize for this. I would do something like:
auto& params = l.get_layer_params();
float* p = params.host();
And then read the weights from the yolo file and store them in p
. However, I did not check in which order the weights are stored in darknet, you have to check that and skip or reshape to your needs.
yep so i know exactly how the weights are stored in darknet format. The problem with what you suggest is that using auto& params = l.get_layer_params();
for affine_
layer will not work since it returns an empty tensor that is never used
Furthermore params
is used as a storage tensor. The actual weights inside the layer classes are all alias_tensor
types. So setting params
correctly is very difficult.
alias_tensor
is just a view into the tensor, to be able to access the weights from the convolution kernel and the biases more easily for example. But everything is stored in the same tensor returned by get_layer_params()
, as far as I know.
If you initialize the network with a dummy input, then the .get_layer_params()
for the affine
layer should not be empty, and if you print its values, you will see some ones (gamma) followed by some zeros (beta).
https://github.com/davisking/dlib/blob/a1f158379e2f328e8697b63ad653926594c8a771/dlib/dnn/layers.h#L2166-L2185
If you look inside layers.h, you will see this:
const tensor& get_layer_params() const { return empty_params; }
tensor& get_layer_params() { return empty_params; }
for affine_
And empty_params
is never set
Oh, I skipped that, so then you need to define yolo changing the template parameter to bn_con
, load the weights and then assign it to the yolo model declared with affine:
yolo<bn_con>::yolov3 net;
// visitor
yolo<affine>::yolov3 net2(net);
And in the visitor you should initialize the missing values from yolo to something sensible.
Ok thought so. That's what i was going on about a few comments ago. Thank you.
In my head affine had a learnable gamma and beta, but it turns out it doesn't, sorry about that.
This is what i have so far. It compiles but doesn't work. Porting the weights shows that the correct number of bytes is read from the file. So it looks like the network structure is correct and the interpretation of the weights is correct. But could be wrong. Maybe there's an error with endianness. Not sure. Please try it and see if you can spot the errors.
Warning: it takes roughly 60 seconds to compile main.cpp. Sigh...
I have been able to build it and run it, but at a first glance I didn't see anything odd... I'll have another look later. Thanks for sharing :)
The detections are all wrong. So a bit stuck as to where the errors are. Model size is correct, and the visitor is reading the correct number of bytes. @arrufat if you find a fix, please post
Possibly need to inspect the output of every layer and compare side by side with either darknet or pytorch implementation.
template <
layer_mode bnmode
>
affine_(
const bn_<bnmode>& item
)
{
gamma = item.gamma;
beta = item.beta;
mode = bnmode;
params.copy_size(item.params);
auto g = gamma(params,0);
auto b = beta(params,gamma.size());
resizable_tensor temp(item.params);
auto sg = gamma(temp,0);
auto sb = beta(temp,gamma.size());
g = pointwise_divide(mat(sg), sqrt(mat(item.running_variances)+item.get_eps()));
b = mat(sb) - pointwise_multiply(mat(g), mat(item.running_means));
}
Why is this happening:
g = pointwise_divide(mat(sg), sqrt(mat(item.running_variances)+item.get_eps()));
b = mat(sb) - pointwise_multiply(mat(g), mat(item.running_means));
??
This could be my problem. I can't set running_variances
or running_means
since get_layer_params
in bn_con only gives me gamma and beta.
Ok. Fixed it. Had to manually adjust gamma and beta using running_variances and running_means. All works now. here is the code:
Now if someone can write the training code with a loss function that uses GIOU, DIOU and CIOU losses, that would be great :) :) (@arrufat ??) Implementing GIOU and company in a framework that supports auto-grad is trivial. Since in dlib you have to manually write the backward passes, I'm likely to make some mistakes with all those derivatives.
I am trying to define yolov3 using dlib's dnn module. I'm stuck with the darknet53 backbone, as I want it to output the outputs of the last three layers. So far i have this:
Is it possible for
darknet53
to outputtag1
,tag2
andtag3
?