Questions about training from scratch

gustavu92 commented 3 years ago

Hello @YanchaoYang, thanks for the great work. I have some questions about the weights initialization (--init-weights='DeepLab_init.pth'). What are they? I mean, if I initialize the network with those weights and infer in some Cityscapes images without any training, we already have some prediction results. But when I try to train the model from the scratch (without any weight initialization), the loss get stuck. Also, I found that if we do not initialize the model with some weights, they are initialized with zeros (at least in VGG model, as in code bellow).

 def _initialize_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                m.weight.data.zero_()
                if m.bias is not None:
                    m.bias.data.zero_()
            if isinstance(m, nn.ConvTranspose2d):
                assert m.kernel_size[0] == m.kernel_size[1]
                initial_weight = self.get_upsampling_weight(
                    m.in_channels, m.out_channels, m.kernel_size[0])
                m.weight.data.copy_(initial_weight)

My question is: What are the steps required to train the network from scratch? For example in another dataset that I don't have this initialization weights. Thank you again!

YanchaoYang commented 3 years ago

Hi @gustavu92, this is a good question. I observed the same thing. But this seems like a quite common practice in the field. People always use pretraining on Imagenet to initialize the weights. I also inherited the initial weights (hopefully pretrained on a similar dataset) from the previous SoA method. I do not have an answer on which initialization to use, but I think this is quite related to transfer learning, you may find more inspiration from those papers.

Klopotek0 commented 9 months ago

Hey @gustavu92, did you train sucessfully on a custom dataset? If so, did you inital weights or figured out loss problem?

YanchaoYang / FDA

Questions about training from scratch #8