Backbone for yolact++ - Githubissues

Rm1n90 commented 4 years ago

Hey @dbolya, Great work! I would like to implement a new backbone for yolact++. I have looked at your ResNetBackbone (some issues too) to have an idea about how to implement a new backbone. I have some questions regarding implementing Backbone for yolact++. I would be appreciated if you can answer me.

Is it necessary to use DCN to implement a backbone for yolact++ or I can implement a backbone without DCN and just use the yolact++ base config file?
Could you provide more details about init_backbone and transform_key (when I implement the new backbone model, how should I implement init_backbone for it?

Thanks!

dbolya commented 4 years ago

Your new backbone doesn't need DCN, just remember to change the backbone in the yolact++ config.
init_backbone takes in the image-net pretrained weights file and loads it into the backbone. If you're training from scratch [though I don't recommend it], you can just leave the function blank.

To write this function, print out the keys for your backbone implementation:

for k in self.state_dict():
    print(k)

and crossreference that with your pretrained weight file:

state_dict = torch.load(path)
for k in state_dict:
    print(k)

If the two lists of keys match perfectly, you don't need to worry about anything and you can just call self.load_state_dict(state_dict) normally.

If there's some difference in keys, you need to have your loaded state_dict match the keys in self.state_dict(), so that's what transform_keys and the other init_backbone functions are doing. So just make sure they match and you're good.

Auth0rM0rgan commented 4 years ago

@dbolya and @Rm1n90, Sorry for jumping between your conversation! I'm implementing a new backbone for yolact++ too and have some questions about hyperparameters that need to be tuned for a new backbone. what should be the values for these ones in the new backbone to get the best performance (selected_layers, pred_scales, pred_aspect_ratios) and would you please explain the intuition behind? Also, is there any other values need to be set for a new backbone?

Thanks a lot!

dbolya commented 4 years ago

@Auth0rM0rgan The most important parameter would be selected_layers. The rest, you should probably keep the same (unless you want to tune these yourself).

selected_layers determines which backbone layers we should add prediction heads to. The indices of this is determined by what you output during the forward pass. So for instance if your forward pass looks like:

def forward(x):
    a = self.conv1(x)
    b = self.conv2(a)
    c = self.conv3(b)
    return (a, b, c)

A selected_layers of [1, 2] would add prediction heads to b and c in the forward function.

When not using FPN (i.e., SSD mode) and you specify a index larger than 2 in this case, the backbone will add more layers to compensate using the add_layer function. If you're using FPN, which I assume you are then you can just ignore this function.

Then the way you add extra layers with FPN is instead of directly selecting them, you set the fpn.num_downsample parameter to the number of layers you want to add.

As for what parameters you should use, both YOLACT and YOLACT++ use 5 prediction heads, so if you don't want to do any extra tuning you should select 5 layers. Both versions select 3 backbone layers and add 2 downsample layers. If you want to be similar to that, you can just select the last 3 "blocks" of your backbone (if there's any stride 2 convolution, output the activations right before that stride), and add 2 FPN layers like in the current configs.

Alternatively (and preferably), you can try to match the scales of the layers with the layers of Resnet. For reference, the layers we select have resolutions for images of size 550x550 of:

conv3 (P3): 1/  8 the image size (69x69)
conv4 (P4): 1/ 16 the image size (35x35)
conv5 (P5): 1/ 32 the image size (18x18)
  .   (P6): 1/ 64 the image size ( 9x 9)
  .   (P7): 1/128 the image size ( 5x 5)

Just match your selected layers with those dimensions and if you need extra layers like with P6 and P7, add them with fpn.num_downsample.

Also by default, we also put protonet on the first of these selected layers (in this case the 69x69 P3), but you can change that too with mask_proto_src.

As an aside, I was about to say you can look at the documentation in config.py for the rest of the parameters, but it looks like I forgot to write documentation for backbone parameters...

Auth0rM0rgan commented 4 years ago

@dbolya Thanks for all the information!

ahkarami commented 4 years ago

Dear @dbolya, would you please add the MobileNet backbone to the pre-trained models?

dbolya commented 4 years ago

@ahkarami It's a good thing to have, so I'll add it to my TODO list (along with efficientnet).

Auth0rM0rgan commented 4 years ago

@dbolya, Would you please add the Vovnet and Vovnet2 backbone? It seems these backbones are better than MobileNet, EffecientNet or even ResNet.