What's the difference between PFCN and SOLOv2 on dynamic conv?

dvlab-research / PanopticFCN

Fully Convolutional Networks for Panoptic Segmentation (CVPR2021 Oral)

Apache License 2.0

391 stars 53 forks source link

What's the difference between PFCN and SOLOv2 on dynamic conv? #12

Closed lucasjinreal closed 3 years ago

lucasjinreal commented 3 years ago

From what we can saw from code, the thingsGenerator and stuffGenerator actually doesn't need Dynamic conv (correct me if am wrong).

As u can see, the conv acutally doesn't need a dynamic weights, but normalized when init:

self.embed_extractor = Conv2d(input_channels, conv_dims, kernel_size=1)
        for layer in [self.embed_extractor]:
            nn.init.normal_(layer.weight, mean=0, std=0.01)
            if layer.bias is not None:
                nn.init.constant_(layer.bias, 0)

I think this is different then SOLOV2's dynamic conv, they wil output kernel_weights and construct a new conv, in code it like this:

kernel_preds = kernel_preds.view(N, I, 1, 1)
        seg_preds = F.conv2d(seg_preds, kernel_preds,
                             stride=1).squeeze(0).sigmoid()

I want ask, why this is different? which is more advanced?

yanwei-li commented 3 years ago

Actually, they perform so-called dynamic conv in different manners. You can assume that we use torch.matmul to replace the traditional F.con2d for faster inference. Of course, you can use F.con2d(x, meta_weight, stride=1) to get a same result.

lucasjinreal commented 3 years ago

@yanwei-li You are right, actually solov2's F.conv2d can replaced with torch.matmul since their pad and stride and kernel size are 1. In yours the differences is the weights learned not predicted.

Does there a better one between them in terms of final accuracy? Also, did u guys compare AP with SOLOv2 in terms of instances part only? I not saw such comparsion in paper.

yanwei-li commented 3 years ago

Hi, I'm not sure the meaning of "the weights learned not predicted". And we also use the convolution (self.embed_extractor in the screen caption) to predict parameters. For the reason that the aim of this paper is to further unify the representation of thing and stuff, rather than aim to solve a single one of them. So, we did not compare with SOLO on each single task.

lucasjinreal commented 3 years ago

@yanwei-li You don't predict the weights of conv, does u? Instead, you just learned the params which different than SOLO, I just want to know what's the impact in terms of final accuracy?

yanwei-li commented 3 years ago

Hi, the weight of conv is actually meta_weight here.

lucasjinreal commented 3 years ago

@yanwei-li OK, i get it, you just using matmul replaced F.conv2d.