Epiphqny / VisTR

[CVPR2021 Oral] End-to-End Video Instance Segmentation with Transformers
https://arxiv.org/abs/2011.14503
Apache License 2.0
738 stars 96 forks source link

Custom Backbone #36

Open cankocagil opened 3 years ago

cankocagil commented 3 years ago

Hello, I am very impressed with your work. As so, I want to train your model from scratch with a custom backbone for my research. I will be appreciated if you can provide requirements for the custom backbone with how-to-do guidelines. Best!

Epiphqny commented 3 years ago

Hi @cankocagil, what do you mean by custom backbone? You could pre-train the backbone on the detr model (with feature dimension of 384), and use the common trained weights for VisTR.

cankocagil commented 3 years ago

Hi, I developed a custom backbone based on transformers. İt's feature dimension is also 384. But I could not solve how can I insert to your system. My real problem is that I could not replace your backbone with mine. For example, how can I change your backbone from your backbone.py module. Maybe, the following will work?

class BackboneBase(nn.Module):

def __init__(self, backbone: nn.Module, train_backbone: bool, num_channels: int, return_interm_layers: bool):
    super().__init__()
    for name, parameter in backbone.named_parameters():
        if not train_backbone or 'layer2' not in name and 'layer3' not in name and 'layer4' not in name:
            parameter.requires_grad_(False)
    if return_interm_layers:
        return_layers = {"layer1": "0", "layer2": "1", "layer3": "2", "layer4": "3"}
    else:
        return_layers = {'layer4': "0"}
    #self.body = IntermediateLayerGetter(backbone, return_layers=return_layers)
     self.body = custom_backbone()
    self.num_channels = num_channels

This probably does not work. Can you provide a sample code of it?