andr345 / frtm-vos

Code accompanying the paper Learning Fast and Robust Target Models for Video Object Segmentation
GNU General Public License v3.0
124 stars 24 forks source link

A problem about discriminator.py and seg_network.py #24

Open limoran233 opened 3 years ago

limoran233 commented 3 years ago

在discriminator.py中,
class Discriminator(nn.Module): ... def apply(self, ft):

    self.frame_num += 1
    cft = self.project(ft)
    self.current_sample = cft
    scores = self.filter(cft)
    return scores

与seg_network.py中, class SegNetwork(nn.Module): ... def forward(self, scores, features, image_size):

    num_targets = scores.shape[0]
    num_fmaps = features[next(iter(self.ft_channels))].shape[0]
    if num_targets > num_fmaps:
        multi_targets = True
    else:
        multi_targets = False

    x = None
    for i, L in enumerate(self.ft_channels):
        ft = features[L]
        s = interpolate(scores, ft.shape[-2:])  # Resample scores to match features size

        if multi_targets:
            h, hpool = self.TSE[L](ft.repeat(num_targets, 1, 1, 1), s, x)
        else:
            h, hpool = self.TSE[L](ft, s, x)

        h = self.RRB1[L](h)
        h = self.CAB[L](hpool, h)
        x = self.RRB2[L](h)

    x = self.project(x, image_size)
    return x

Hello, I print cft and scores, and found that for the evaluate process of each frame, the sequence has several targets, and it will print cft and scores the same times. The size of scores is [1,1,m,n], and for different target, the parameters are not the same. I want to know why this is and where is it set up? Another question: In seg_network.py, I found that num_targets and num_fmaps are always 1, and there are several num_targets in the target, and output 1 the same times, so multi_targets is always False, but the result of segmentation is multi-target. Why? ? I am really confused and need your answer. Looking forward to your reply!

felja633 commented 3 years ago

For each target we have one instance of the discriminator, each containing a set of parameters representing the target. So, for each frame with K targets there will be K target score predictions with dim [1,1,m,n].

The segnetwork predicts a segmentation mask per target. The scores of the masks are only fused in the end. There extists the option of processing multiple targets simultaneously (multi_targets = True) , but that only concatenates them in dim=0. This could potentially make the processing slightly faster, but will not have any impact on the accuracy.

limoran233 commented 3 years ago

Thank you very much for your quick reply, I am more clear now. By the way, the masks for multiple targets are fused together in the memory module, but separated in the discriminator for single-target processing. Am I right?

felja633 commented 3 years ago

The masks from different targets are first fused, then separated for each target before stored in the memory

limoran233 commented 3 years ago

Ah, although I don’t understand it very well, thank you very much. I will continue to think seriously.